Rich DDI Triple Encoder

This protocol is extracted from research article:

Drug-Drug Interaction Predictions via Knowledge Graph and Text Embedding: Instrument Validation Study

**
JMIR Med Inform**,
Jun 24, 2021;
DOI:
10.2196/28277

Drug-Drug Interaction Predictions via Knowledge Graph and Text Embedding: Instrument Validation Study

DOI:
10.2196/28277

Procedure

The interaction *l* between 2 drug entities, u and v, in rich DDI triples (*u*, *l*, *v*), *∈*
*T*, can also be represented as translations in low-dimensional space. We set **u**, **v**
*∈* R^{k}, **l**
*∈* R^{d}. The energy function *z*_{dte} (*u*, *l*, *v*) is defined as follows:

where *b*_{2} is a bias constant and **M**_{l} = R^{×}* ^{d}* is the projection matrix. Following the analogous method in the basic triple encoder, the conditional likelihoods of all existing triples are maximized as follows:

Note, in equation 5, l is the relation representation obtained from *l* = {*n*_{1}, *n*_{2},…}. This will be introduced in-depth next.

A deep autoencoder is employed to construct the relation representation *l*
*∈*
*R*^{d} for a rich DDI triple (*u*, *l*, *v*) ∈ T. Specifically, a DDI relation, *l*, is described by a set of labels *l* = {*n*_{1}, *n*_{2},… } *⊆ L*. The corresponding binary vector for *l* is initialized as **s** = , where **s*** _{i}* = 1 if

where *f* is the activation function and *K* is the number of layers. Here, h^{(}^{i}^{)}, **W**^{(}^{i}^{)}, and **b**^{(}^{i}^{)} represent the hidden vector, transformation matrix, and the bias vector in the *i*-th layer, respectively.

There are 2 parts to the autoencoder: an encoder and a decoder. The encoder employs the *tanh* activation function to obtain the DDI relation representation **l** = *h*^{(}^{K}^{/2)}. The decoder deciphers the embedding vector of **l** to obtain a reconstructed vector . Intuitively, PRD should then minimize the distance because the reconstructed vector should be similar to **s**. However, the number of zero elements in **s** is usually much larger than that of nonzero elements due to data sparsity. This leads the decoder to tend to reconstruct zero elements rather than nonzero elements, which conflicts with our purpose. To overcome this obstacle, different weights are set for different elements, and the following objective function is maximized:

where b_{3} is a bias constant, **x** is a weight vector, and ⊙ is denoted as the Hadamard product. For **x** = , **x*** _{i}* = 1, if

where *S* is the set of binary vectors of all DDI relations. The likelihood of reconstructing the binary vector s of a relation *l* can be defined as follows:

By maximizing the likelihoods of the encoding and the decoding for all described relations l, the objective function can be defined as follows:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A

Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.