Discriminative keywords
This protocol is extracted from research article:
The public and legislative impact of hyperconcentrated topic news
Sci Adv, Aug 28, 2019; DOI: 10.1126/sciadv.aat8296

We are interested in identifying and summarizing those aspects of a domain’s current framing that distinguish it from the domain’s framing at a previous time period. To this end, we adopted the idea of an entropic formulation of discriminative keywords, as proposed by Sheshadri et al. (26).

Below, a corpus T is a set of news articles. Specifically, given two disjoint sets of news articles T1 and T2, we identified a set of k n-grams that yield the largest Cross Entropy (29) in the combined corpus T = T1T2. Let A be an article in corpus T. Let xi represent any of the possible m n-grams in T. Let S(xi, T) = {AT | xiA} be the set of articles in corpus T in which the n-gram xi appears. We used a |T| × m term frequency matrix representing the corpus to calculate H, the information entropy of T. We use MATLAB’s fitctree and predictor importance functions with a split criterion parameter of “deviance” to estimate the utility of each n-gram.IG(T,xi)=H(T)S(xi,T)|T|H(S(xi))(1)

Following Entman’s (9) formulation, this approach weights n-grams that are specific to a particular corpus more highly than n-grams that are common to both corpora. A quick intuition for the approach is obtained by considering that the unigram “Snowden” may have a high utility in distinguishing Surveillance articles published after 1 January 2014 from those before them, but the unigram “surveillance” is common to articles from both periods and therefore may not. Because keywords from a particular news corpus distinguish it from others, they may be said to represent the “concentration” of news in that corpus.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.