2.2. Manual Data Annotation

TC Tiffany Champagne-Langabeer
MS Michael W. Swank
SM Shruthi Manas
YS Yuqi Si
KR Kirk Roberts
request Request a Protocol
ask Ask a question
Favorite

Prior to NLP-based classification, the raw data had to be “labeled” for supervised machine learning. In total, 2000 tweets were manually double-annotated by a group of three individuals for the following: (1) The relation to telehealth (yes, no); (2) The sentiment (positive, neutral, negative); (3) The user category (clinician, consumer, policymaker, vendor, other); (4) The relationship to COVID-19 (yes, no). A tweet received a positive sentiment if it contained optimistic, encouraging, or validating language, e.g., “Telehealth is a valuable tool to provide care; protect people in this COVID19 pandemic”. A tweet received a negative sentiment code if the tweet contained emotional words that conveyed pessimistic, debasing, or discouraging feelings, e.g. “Do you want people to keep dying and you aren’t doing anything about it?” Finally, a tweet received a neutral sentiment if it included neither negative nor positive words; these tweets frequently expressed educational or objective informational phrases, e.g., “Effects of a telehealth educational intervention on medication adherence”. When annotating telehealth-related tweets for sentiment, there was a possibility that a tweet could mention both telehealth and a sentiment—but have a sentiment unrelated to telehealth. Annotators were trained to evaluate the sentiment only as it related to telehealth, and these data were used to train the machine learning model.

A user was regarded as a clinician if the tweet contained key phrases which alerted to clinical events or activities, such as “Excited to speak to residents about ethics and telemedicine in medical careers”. A user was regarded as a consumer if the tweet contained phrases signaling they had used the technology as a patient or an obvious third party, such as “Telemedicine is being offered. Have a video session next week.” A user was regarded as a vendor if the tweet included phrases which suggested the user had an economic stake in promoting a product or service, such as “Dermatology Telemedicine Physician seeking Dermatologists to join”. The user was considered a policymaker if a policy, governmental entity, or institutional course of action was discussed in the tweet. The user was considered “other” if the tweet was unable to be easily placed into any one category. Any case-insensitive use of the terms “covid”, “covid-19”, or “coronavirus” indicated a relationship to COVID-19.

The first 200 tweets were manually annotated by all three annotators, then reconciled as a group that included an expert in NLP (KR) to calibrate the annotations. The remaining 1800 tweets were double-annotated by two of the three annotators. Afterward, all disagreements regarding the classification of the tweets were further reconciled to ensure a consistent set of manual annotations. The annotator agreement with the reconciled standard was 0.78 for telehealth (Cohen’s Kappa), 0.78 for COVID-19-related (Cohen’s Kappa), 0.77 for user (Fleiss’s Kappa), and 0.67 for sentiment (Fleiss’s Kappa). All 2000 manually annotated tweets were used to train/evaluate the NLP model, as described in the next section.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A