Disease information was downloaded from the Comparative Toxicogenomics Database (CTD) [14]. Diseases were screened according to the following criteria in order to compile a distinct set of names amenable to making unambiguous occurrence counts in text annotations in pathway databases: 1. A disease name is not an extension of another disease name from the same disease, 2. A disease name is not related to a psychological condition, 3. A disease is not a category for multiple diseases otherwise included (e.g., neurodegenerative disease), 4. A disease is not related to an environmental condition (e.g., mite infestations), 5. A disease is not an alias for another disease, 6. A disease is not a symptom (e.g., abdominal pain), 7. A disease name is not ambiguous relative to included disease names (e.g., cancer). The final number of filtered disease names was 876. Text titles and descriptions (or captions) were collected for each of the human pathways from PFOCR, Reactome, WikiPathways, and KEGG. Case-insensitive string matching functions were used to identify disease name occurrences in the collected text samples. A match was only counted once per pathway even if the disease name occurred multiple times within or across text samples for that pathway. The resulting pathway counts per disease and per database are shown in Supplement Table Table11 and a subset in Table Table1.1. Reactome and WikiPathways provide additional sources for disease annotation, including ontology tags, gene descriptions, and bibliography titles that we did not include in this accounting in order to make a fair comparison across all four resources.
To investigate disease gene coverage of the pathway databases, the human disease gene file 'human_disease_knowledge_filtered.tsv' was downloaded from Jensen DISEASES [98]. Jensen disease names that exactly matched the CTD disease names were selected for investigation. The number of genes present in integrated pathways for each disease was determined for each pathway database and also expressed as a percentage of the number of genes defined by Jensen DISEASES for each disease.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.