Benchmark Dataset and Sample Formulation

Xiangeng Wang; Yanjing Wang; Zhenyu Xu; Yi Xiong; Dong-Qing Wei

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Benchmark Dataset and Sample Formulation

XW Xiangeng Wang

YW Yanjing Wang

ZX Zhenyu Xu

YX Yi Xiong

DW Dong-Qing Wei

This method is extracted from research article: Front Pharmacol, Sep 2019

ATC-NLSP: Prediction of the Classes of Anatomical Therapeutic Chemicals Using a Network-Based Label Space Partition Method

DOI: 10.3389/fphar.2019.00971

Request a Protocol

Ask a question

Favorite

We utilized the same dataset as the previous study (Cheng et al., 2017b) to facilitate model comparison. This dataset consists of 3,883 drugs, and each drug is labeled with at least one or more of 14 main ATC classes. It is a tidy dataset where no missing value and contradictory record. The UpSet visualization technique (Lex et al., 2014) was used for quantitative analysis of interactions of label sets.

Then, we adopted the same method provided by (Cheng et al., 2017b) to represent the drug samples. The dataset can be formulated in set notation as the union of elements in each class: $S = S_{1} \cup S_{2} \dots \cup S_{14}$ (1), and a sample D can be represented by concatenating the following three types of features.

A 14-dimentional vector, D ^Int = [Φ₁Φ₂Φ₃ … Φ₁₄]^T (2), which represents its maximum interaction score Φ_i (Kotera et al., 2012) with the drugs in each of the 14 $S_{i}$ .

A 14-dimentional vector, D ^StrSim = [Ψ₁Ψ₂Ψ₃ _… Ψ₁₄]^T (3) which represents its maximum structural similarity score Ψ_i (Kotera et al., 2012) with the drugs in each of the 14 $S_{i}$ .

A 14-dimentional vector, D ^FigSim = [T₁T₂T₃ … T₁₄]^T (4), which represents its molecular fingerprint similarity score T_i (Xiao et al., 2013) with the drugs in each of the 14 $S_{i}$ .

Therefore, a given drug D is formulated by:

Where ⊕ represents the symbol for orthogonal sum and where

For more details, refer to Cheng et al. (2017b).

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol