Knowledge graph, a powerful AI technology to abstract, organize and integrate knowledge extracted from multiple data sources, has become an emerging technology for biomedical discovery. We constructed a knowledge graph-based prediction system, KG-Predict, that prioritizes candidate drugs for a given input disease by modeling the interconnections between drugs, genes, diseases and phenotypical annotations from publicly available phenome-level databases, genome-level databases and text-mined knowledge bases [26, 27]. In KG-Predict, the associations between drugs and their corresponding phenotypes were obtained from the Phenomebrowser [29] database. The associations between genes and their functions were obtained from Gene Ontology Annotation (GOA) [30], Mouse Genome Informatics (MGI) [31] and Genotype-Tissue Expression (GTEx) [32] databases. The associations between diseases and phenotype ontologies were obtained from the human phenotype ontology (HPO) database [33]. The associations between drugs and genes were obtained from the DrugBank database [34]. We extracted disease–gene interactions from the MGI database. The drug–disease interactions were mined by natural language processing (NLP) techniques from records of patients in FAERS, FDA drug labels, MED-LINE abstracts and clinical trial studies [35, 36]. The knowledge graph in KG-Predict was composed of seven types of entities (e.g. drugs, genes, diseases, phenotypical annotations) linked by nine types of semantic relations (e.g. drug–target–gene, gene–associate–GOA). More details are provided in Supporting information, Table S1 in the supporting information.
In this study, the input to KG-Predict is a list of CUD-associated genes. The output is a list of candidate drugs prioritized based on their genetic, genomic and phenotypical relevance to CUD. To collect CUD-associated genes, we first obtained 383 genes that are associated with cocaine-related diseases (cocaine dependence, cocaine abuse and cocaine-related disorder) from DisGeNet [37], a discovery platform with one of the largest publicly available collections of genes and variants associated with human diseases. We used the median score of 0.5 as the threshold. At a cut-off score of 0.5, 19 genes were associated with cocaine-related diseases. We also obtained three CUD-associated genes from the published literature [38–40]. The CUD-associated gene list included DRD3, GABRA2, CAMK4, MECP2, OPRK1, COMT, CREB1, CARTPT, CRH, CNR1, CRHR1, OPRM1, SLC6A4, NPY, PDYN, DRD2, HTR1B, SLC6A3, EGR1, GAD1, GABRB3 and BDNF.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.