A typical KG usually arranges knowledge as a triple set of facts that indicates the relation between 2 entities, and thus comprises a head entity, a relation, and a tail entity. These are denoted as (h, r, t).

First, a basic drug KG is constructed by collecting drug-related entities and relations among these entities. We follow the data model of drug-related extraction settings defined in the work of Kamdar and Musen [27], in which the types of entities or relations are summarized in the fashion depicted in Table 1. Specifically, we use SPARQL federation queries [20] to extract triples that contain 4 types of drug-related entities (E1~E4) and 5 types of biological relations (R1~R4) from a variety of biomedical sources (eg, Bio2RDF [18]). These extracted triples are defined as basic triples in our drug KG according to definition 1: (basic triple) B = (E, R) is a set of basic triples in the form (h, r, t), where E = E1∪ E2 …∪ E4 is a set of entities; and R = R1R2…∪ R5 is a set of relations, h, t ∈ E, and r ∈ R.

Entities and relations of basic triples in Kamdar and Musen [27].

For instance, we can extract “(etanercept, hasTarget, lymphotoxin-alpha)” as a basic triple in our drug KG, which indicates that there is a relationship “hasTarget” linking etanercept to lymphotoxin-alpha, meaning that lymphotoxin-alpha is one of the targets of etanercept.

A specific DDI between 2 drugs can be captured by multiple key phrases extracted from biomedical text, as shown in Figure 2. Hence, we collect biomedical DDI text documenting drug pairs (eg, DDI corpus [28], MEDLINE abstracts, and DrugBank DDI documents). We remove all stop words from raw text and use an entity linking method [29] to align the drug names in the biomedical text with the KG. The top-n labels (n=5) are then selected from the biomedical text for each DDI based on the term frequency-inverse document frequency (TF-IDF) features (some other textual features can be used to rank the labels instead).

A drug knowledge graph is shown on the left with missing relations represented as dotted lines. There is usually no direct DDI relation between drugs. DDI descriptions from the biomedical text are shown on the right. The words in red represent concerns regarding DDI information in terms of both adverse DDIs and in-depth ways drugs can interact in pharmacology. DDI: drug-drug interaction.

Based on this, the DDI relations between drug entities are defined as a set of labels rather than as a single label according to definition 2: (rich DDI triple) T = (E1, L) is a set of rich DDI triples in the form (u, l, v), where E1 is a set of drug entities; L is a fixed label vocabulary from biomedical text; and u, v ∈ E1 and l = {n1, n2, …} ⊆ L is the set of labels to describe the DDI information.

For instance, the following is an example of a rich DDI triple: (etanercept, {immunosuppressants, enhancetoxicity, anemia, infections}, leflunomide), where “enhancetoxicity” means etanercept can enhance the toxicity of leflunomide. Note that the DDI relations between 2 drugs are bidirectional; hence, our method replaces each rich DDI relation with 2 directed triples of opposing directions in the drug KG.

Formally, the generated drug KG is defined according to definition 3 (drug KG): the drug KG, G, is denoted as (E, B, T), where E = E1E2…∪ E4 is a set of entities, B is a set of basic triples, and T is a set of rich DDI triples.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.