Fracture, PT, and HP variables were parsed from two sources: the DICOM file header and clinical notes (see Fig. Fig.1b).1b). The DICOM file headers recorded the image acquisition specifications in a tabular format. The clinical notes recorded patients’ demographics and radiologists’ interpretation times in a tabular format, and the patients’ symptoms and radiologists’ image impression in free text. These clinical notes were retrieved via Montage (Nuance Communications, Inc, New York, NY). To abstract the presence of a symptom (i.e., pain, fall), we applied regular expressions to the noted indication. To abstract fracture from the physicians’ image interpretation, we used a word2vector-based algorithm previously described by Zech et al.46 This supervised learning algorithm required a subset of radiologists’ notes to be manual labeled as either reporting “acute fracture” or “no acute fracture”. Fractures reported anywhere in the image were considered positive (irrespective of anatomic location), and if there was no mention of fracture then the sample was considered an implicit negative. Radiology reports that did not have a corresponding image were used for training the NLP algorithm, which was then used to infer fracture status on the 23,557 matched images and reports used in image model development. We manually reviewed another 100 notes used in image model development to evaluate to performance of the NLP algorithm (Supplementary Table 19). Further label processing was performed to remove infeasible values, binarize values for binary classification models, and impute missing data, as described in the supplementary methods.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.