Scalar variable extraction

Marcus A. Badgeley; John R. Zech; Luke Oakden-Rayner; Benjamin S. Glicksberg; Manway Liu; William Gale; Michael V. McConnell; Bethany Percha; Thomas M. Snyder; Joel T. Dudley

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Scalar variable extraction

MB Marcus A. Badgeley

JZ John R. Zech

LO Luke Oakden-Rayner

BG Benjamin S. Glicksberg

ML Manway Liu

WG William Gale

MM Michael V. McConnell

BP Bethany Percha

TS Thomas M. Snyder

JD Joel T. Dudley

This method is extracted from research article: NPJ Digit Med, Apr 2019

Deep learning predicts hip fracture using confounding patient and healthcare variables

DOI: 10.1038/s41746-019-0105-1

Ask a question

Favorite

Fracture, PT, and HP variables were parsed from two sources: the DICOM file header and clinical notes (see Fig. Fig.1b).1b). The DICOM file headers recorded the image acquisition specifications in a tabular format. The clinical notes recorded patients’ demographics and radiologists’ interpretation times in a tabular format, and the patients’ symptoms and radiologists’ image impression in free text. These clinical notes were retrieved via Montage (Nuance Communications, Inc, New York, NY). To abstract the presence of a symptom (i.e., pain, fall), we applied regular expressions to the noted indication. To abstract fracture from the physicians’ image interpretation, we used a word2vector-based algorithm previously described by Zech et al.⁴⁶ This supervised learning algorithm required a subset of radiologists’ notes to be manual labeled as either reporting “acute fracture” or “no acute fracture”. Fractures reported anywhere in the image were considered positive (irrespective of anatomic location), and if there was no mention of fracture then the sample was considered an implicit negative. Radiology reports that did not have a corresponding image were used for training the NLP algorithm, which was then used to infer fracture status on the 23,557 matched images and reports used in image model development. We manually reviewed another 100 notes used in image model development to evaluate to performance of the NLP algorithm (Supplementary Table 19). Further label processing was performed to remove infeasible values, binarize values for binary classification models, and impute missing data, as described in the supplementary methods.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol