Rebuilding MEDI with updated publicly available resources

Neil S. Zheng; V. Eric Kerchberger; Victor A. Borza; H. Nur Eken; Joshua C. Smith; Wei-Qi Wei

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Rebuilding MEDI with updated publicly available resources

NZ Neil S. Zheng

VK V. Eric Kerchberger

VB Victor A. Borza

HE H. Nur Eken

JS Joshua C. Smith

WW Wei-Qi Wei

This method is extracted from research article: Sci Rep, Sep 2021

An updated, computable MEDication-Indication resource for biomedical research

DOI: 10.1038/s41598-021-98579-4

Request a Protocol

Ask a question

Favorite

We derived medication information from six publicly available resources: RxNorm, SIDER 4.1, Mayo Clinic, WebMD, MedlinePlus, and Wikipedia. Detailed descriptions of these resources are provided in Supplementary Table ¹. RxNorm and SIDER 4.1 maintain medication-indication information in a structured table, while the other four resources are free-text based and are primarily focused on consumer health information.

For RxNorm, we retrieved all medication concepts and their associated RxCUIs from the prescribable subset of RxNorm^²⁷. Using the UMLS, we mapped the medications from RxNorm to indications represented by UMLS CUIs with the UMLS relationships ‘may_be_treated_by,’ ‘may_be_prevented_by,’ and ‘may_be_diagnosed_by. The ‘may_be_diagnosed_by’ relationship flag was included as it captured some true indications such as “levothyroxine” and “disorder of thyroid gland” or “papaverine” and “erectile disorder.” SIDER 4.1 provided medication names as free text and indications as UMLS CUIs^⁷. We mapped the SIDER 4.1 medication names to RxCUIs by string matching with the UMLS.

Mayo Clinic, WebMD, and MedlinePlus all maintain directories of articles describing medications. We wrote a Python bot that automatically scraped the article titles and body text from these directories, excluding article subsections that were related to side effects or contraindications. We mapped the article titles to RxCUIs and combined articles with the same RxCUI. Articles that mapped to several RxCUIs contribute the same indications to each of the mapped medications. For Wikipedia, we extracted articles by querying Wikipedia’s application programming interface using the RxCUI concept names (i.e., medication name). We used KnowledgeMap Concept Indexer to identify medical concepts defined by UMLS CUIs in each medication document^²⁸. KnowledgeMap Concept Indexer is a locally developed NLP pipeline that has been shown to effectively extract medical concepts in medical documents and online resources^{²⁰,²⁸,²⁹}, outperforming the National Library of Medicine’s MetaMap NLP tool in precision and recall^²⁸,³⁰. Medical concepts that were negated were excluded. We filtered the UMLS CUIs for the following semantic types: Disease or Syndrome, Congenital Abnormality, Acquired Abnormality, Anatomical Abnormality, Neoplastic Process, Virus.

The final version of MEDI-2 includes separate medication-UMLS CUI and medication-ICD code relationships. The identified UMLS CUIs from each resource were mapped to ICD-9-CM and ICD-10-CM codes with the UMLS concept tables. For CUIs that did not directly map into ICD but mapped to SNOMED-CT concepts, we used SNOMED-CT to ICD mappings from the National Library of Medicine (https://www.nlm.nih.gov/healthit/snomedct/archive.html; accessed January 2020). For instance, the UMLS does not map ‘Breast Carcinomas’ (CUI C067822) to ICD codes but does map to the SNOMED CT concepts for breast cancer. For UMLS CUIs that mapped to several ICD codes, each ICD code was considered as unique indications. Based on relationships within RxNorm, all medication concepts were grouped by their generic ingredient when possible (e.g. ‘tylenol’ is in group ‘acetaminophen’). Medications that included multiple active ingredients were mapped to a combined multi-ingredient generic when possible (i.e., ‘tylenol with codeine’ mapped to ‘acetaminophen / codeine) or to their single-ingredient components if not. We additionally regrouped MEDI-1 to generic ingredients using the same groupings for MEDI-2 for consistency.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol