Natural Language Processing for Analysis of Clinical Notes with Delayed Start-of-Care

MZ Maryam Zolnoori
JS Jiyoun Song
MM Margaret V McDonald
YB Yolanda Barrón
KC Kenrick Cato
PS Paulina Sockolow
SS Sridevi Sridharan
NO Nicole Onorato
KB Kathryn H Bowles
MT Maxim Topaz
ask Ask a question
Favorite

We developed an NLP algorithm using a regular expression technique [24] where algorithmic rules were crafted to match certain language patterns describing reasons for delayed visits. Regular expression is a powerful text search technique that uses alphabetic characters, numeric expressions, and nonalphanumeric expressions. The regular expression approach can help find certain predefined lists of keywords in the texts [29]. The goal is to capture as much lexical variation as possible using keywords and language patterns.

As the first step of developing the NLP algorithm, the data preprocessing steps involved lowercasing, stripping (removing extra spaces), and removing special characters (eg, [, \, ^, $, |, ?, *, +, (, ), ]). We did not remove stop words because they were important for identifying the reasons for delayed start-of-care. For example, negation indicators (such as no and not) or auxiliary words (such as do, have, and has) were important for identifying if a patient requested to postpone the HHC services. We also did not perform lemmatization or stemming because keeping the original form of words, particularly the verbs, was important for identifying a pattern of late visit.

As the second step of developing the NLP algorithm, we created regular expression rules to capture language describing patient or family requests to postpone HHC or refuse some HHC services. This included the following phrases: “pt declined VNS visit today,” “patient refused visit and asks for visit tomorrow,” “daughter asks to reschedule SOC for Friday.” These phrases have some word patterns, such as “patient/pt/family refused,” “asks for visit tomorrow,” and “reschedule SOC for Friday,” that were used to develop regular expression rules. Table 1 provides more examples of regular expressions applied to identify reasons for delayed start-of-care HHC nursing visits in clinical notes. Multimedia Appendix 1 provides examples of regular expression syntaxes used for identifying patterns of delayed start-of-care for the specified categories.

Examples of regular expressions applied to identify reasons for delayed start-of-care HHCa nursing visits in clinical notes.

aHHC: home health care.

bItalicized text denotes the identified language in the clinical notes that indicates reasons for delayed start-of-care HHC nursing visits.

cVM: voice message.

dMSG: message.

eSOC: start-of-care.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A