Datasets

RA Rayane Achebouche
AT Anne Tromelin
KA Karine Audouze
OT Olivier Taboureau
request Request a Protocol
ask Ask a question
Favorite

This study is based on the integration of two different data sets (i) data for chemical-odor relationships and (ii) data for chemical-olfactory receptor relationships.

We extracted chemical-odors from two separate sources: The Good Scents Company (TGSC) Database46 (as of January 2021), and Leffingwell Database47. Both databases contained information linking the compound and its chemical structure to the odor description as several odor notes. From the TGSC database, we got 27,779 chemicals of which 5659 are related to one or several odor notes. From the Leffingwell database, we got 6054 compounds that are related to one or several odors notes. We merged the outcomes from both databases, eliminating duplicated information. Compounds occurring with the same structure (based on Inchi Key encoding48) but with different names (synonyms) were removed. Odor notes from Leffingwell database was matched with TGSC as reference. To limit the complexity of the models and avoid mis-classification due to poor representation of an odor note, odor notes with less than 20 chemicals were not considered in this analysis. After all these steps, we obtained a dataset made up of 5955 compounds and 160 odors. Each compound is related from 1 to at the maximum 10 odor notes using the order proposed by TGSC.

Compounds tested experimentally on olfactory receptors were gathered from different data sources. It included information from OdorDB49, ODORactor50, OlfactionDB51 and from the literature. To the purpose of the study we considered, first, human receptors in the construction of learning models. We collected 74 human olfactory receptors for 365 compounds. In a second step, human receptors that are orthologs to rodent olfactory receptors, and on which bioactivity has been measured, were also included in the learning model development. With the aggregation of this data, we reached a dataset of 445 different compounds tested on 106 different olfactory receptors.

The datasets generated and analysed during the current study are available in the Table S1 in supplementary.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A