We utilized the same dataset as the previous study (Cheng et al., 2017b) to facilitate model comparison. This dataset consists of 3,883 drugs, and each drug is labeled with at least one or more of 14 main ATC classes. It is a tidy dataset where no missing value and contradictory record. The UpSet visualization technique (Lex et al., 2014) was used for quantitative analysis of interactions of label sets.
Then, we adopted the same method provided by (Cheng et al., 2017b) to represent the drug samples. The dataset can be formulated in set notation as the union of elements in each class: (1), and a sample D can be represented by concatenating the following three types of features.
A 14-dimentional vector, D Int = [Φ1Φ2Φ3 … Φ14]T (2), which represents its maximum interaction score Φi (Kotera et al., 2012) with the drugs in each of the 14 .
A 14-dimentional vector, D StrSim = [Ψ1Ψ2Ψ3 … Ψ14]T (3) which represents its maximum structural similarity score Ψi (Kotera et al., 2012) with the drugs in each of the 14 .
A 14-dimentional vector, D FigSim = [T1T2T3 … T14]T (4), which represents its molecular fingerprint similarity score Ti (Xiao et al., 2013) with the drugs in each of the 14 .
Therefore, a given drug D is formulated by:
Where ⊕ represents the symbol for orthogonal sum and where
For more details, refer to Cheng et al. (2017b).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.