The health lexicon, used for extracting health-related tweets, combines keywords from three different sources in order to minimize bias [51]. These sources include are:

an annotator—a graduate linguist and native Arabic speaker who reviewed health-related accounts and hashtags to identify 110 health-related words.

field experts—three medical doctors who are active on Twitter; they suggested 100 health-specific words that typically occur in health-related tweets.

an existing health dictionary—we took 232 words from the Arabic health dictionary proposed by Collier et al. [52]. These 232 words are the only words out of all 968 words in the dictionary, that occur in the tweets we have collected.

We then combined all the words. However, we found that there were still some words not specific to health in the lexicon, resulting in a high number of false positive tweets. Thus, similar to Hicks et al. [53], Prus et al. [54] and Zhang and Ahmed [55], we removed these words. Our final lexicon consists of 263 Arabic health-related terms created from the sources above. It is available at

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.