We derived medication information from six publicly available resources: RxNorm, SIDER 4.1, Mayo Clinic, WebMD, MedlinePlus, and Wikipedia. Detailed descriptions of these resources are provided in Supplementary Table 1. RxNorm and SIDER 4.1 maintain medication-indication information in a structured table, while the other four resources are free-text based and are primarily focused on consumer health information.
For RxNorm, we retrieved all medication concepts and their associated RxCUIs from the prescribable subset of RxNorm27. Using the UMLS, we mapped the medications from RxNorm to indications represented by UMLS CUIs with the UMLS relationships ‘may_be_treated_by,’ ‘may_be_prevented_by,’ and ‘may_be_diagnosed_by. The ‘may_be_diagnosed_by’ relationship flag was included as it captured some true indications such as “levothyroxine” and “disorder of thyroid gland” or “papaverine” and “erectile disorder.” SIDER 4.1 provided medication names as free text and indications as UMLS CUIs7. We mapped the SIDER 4.1 medication names to RxCUIs by string matching with the UMLS.
Mayo Clinic, WebMD, and MedlinePlus all maintain directories of articles describing medications. We wrote a Python bot that automatically scraped the article titles and body text from these directories, excluding article subsections that were related to side effects or contraindications. We mapped the article titles to RxCUIs and combined articles with the same RxCUI. Articles that mapped to several RxCUIs contribute the same indications to each of the mapped medications. For Wikipedia, we extracted articles by querying Wikipedia’s application programming interface using the RxCUI concept names (i.e., medication name). We used KnowledgeMap Concept Indexer to identify medical concepts defined by UMLS CUIs in each medication document28. KnowledgeMap Concept Indexer is a locally developed NLP pipeline that has been shown to effectively extract medical concepts in medical documents and online resources20,28,29, outperforming the National Library of Medicine’s MetaMap NLP tool in precision and recall28,30. Medical concepts that were negated were excluded. We filtered the UMLS CUIs for the following semantic types: Disease or Syndrome, Congenital Abnormality, Acquired Abnormality, Anatomical Abnormality, Neoplastic Process, Virus.
The final version of MEDI-2 includes separate medication-UMLS CUI and medication-ICD code relationships. The identified UMLS CUIs from each resource were mapped to ICD-9-CM and ICD-10-CM codes with the UMLS concept tables. For CUIs that did not directly map into ICD but mapped to SNOMED-CT concepts, we used SNOMED-CT to ICD mappings from the National Library of Medicine (https://www.nlm.nih.gov/healthit/snomedct/archive.html; accessed January 2020). For instance, the UMLS does not map ‘Breast Carcinomas’ (CUI C067822) to ICD codes but does map to the SNOMED CT concepts for breast cancer. For UMLS CUIs that mapped to several ICD codes, each ICD code was considered as unique indications. Based on relationships within RxNorm, all medication concepts were grouped by their generic ingredient when possible (e.g. ‘tylenol’ is in group ‘acetaminophen’). Medications that included multiple active ingredients were mapped to a combined multi-ingredient generic when possible (i.e., ‘tylenol with codeine’ mapped to ‘acetaminophen / codeine) or to their single-ingredient components if not. We additionally regrouped MEDI-1 to generic ingredients using the same groupings for MEDI-2 for consistency.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.