This was a retrospective observational analysis using data from the Optum Clinical Electronic Health Record Database, which contains deidentified and aggregated clinical and medical administrative data from more than 54 US health care delivery organizations, including more than 140 000 providers at more than 700 hospitals and 7000 clinics. These data come from all EHR capture systems submitted by participating organizations. Data are obtained from physician offices, emergency rooms, laboratories, and hospitals and include demographic information, vital signs, and other observable measurements, medications prescribed and administered, laboratory test results, administrative data for clinical and inpatient stays, and coded diagnoses and procedures. At the time of data extraction, the database contained records for approximately 47 million primarily community‐dwelling patients across the United States and Puerto Rico, with an average of 45 months of observed data per patient. Because no identifiable protected health information was extracted or accessed during the course of the study, institutional review board approval or waiver of authorization was not required.
In addition to the data described above, the key EHR data for this study comprised abstracted provider notes records, which were extracted from electronic notes via a natural language processing (NLP) system developed and maintained by Optum Analytics (OA; Boston, Massachusetts). The NLP system captures words and phrases from unstructured text in clinical notes—including conditions, signs and symptoms, family history, disease‐related scores and diagnostic procedures, medication changes, and physician rationale for prescribing decisions—and converts them into abstracted notes records that contain deidentified, consistently formatted content for analysis. The abstracted notes records output via NLP consist of the main terms, such as conditions (eg, Alzheimer disease) or symptoms (eg, agitation), accompanied by additional data fields that provide context; these supporting fields contain terms relating to severity/frequency/duration, body part or measurement value, medical chart section, and qualifiers such as negation or progress in the diagnostic process or input from family members. Main terms of interest for the NLP system were identified using vocabulary from the Unified Medical Language System, which includes medical dictionaries such as the Logical Observation Identifiers Names and Codes, the Systemized Nomenclature of Medicine‐Clinical Terms, and RxNorm (a listing of generic and branded drugs), among others. New NLP concepts are created, and the performance of the NLP system is verified, by a team of medical terminologists and clinicians from OA that assesses the accuracy of the NLP output compared with a manual review of sample EHR notes.
For this study, abstracted notes records to identify AD/dementia and agitation were reviewed manually by Optum's medical director to determine whether the overall combinations of terms in the notes record fields were indicative of probable AD/dementia and agitation (eg, pertained to patient behavior or disposition and were not negation).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.