To capture the topics, themes, and events associated with regional climate opinions, we extract data from Twitter user accounts. Twitter offers two well-documented Application Program Interfaces (APIs) through which data can be accessed. The Standard API allows virtually unrestricted access to tweets produced in the last seven days, with a limit on the number of requests per hour. The Premium API allows users to access the full Twitter archive, but has much more restrictive rate limits. This API is designed for users and organizations who are interested in a paid subscription.

In the interest of data accessibility, this analysis relies on the Standard API. By using tweets collected over a week long period, we ensure that all of the information required to conduct the study can be collected without a premium API subscription. One shortcoming of the Standard API is that it is not guaranteed to return every tweet in a given period of time. However, by taking advantage of different parameters in Twitter’s search function (namely specifying the ID of the most recent tweet to retrieve), the majority of tweets within a specific window can be retrieved. In two validation tests conducted over two separate three hour windows, we found that the Standard API returned at least 98% of the tweets retrieved by the Premium API.

We utilize two data subsets in training the proposed predictive model: a large corpus to train the topic model (later referred to as the topic corpus) and a smaller, geo-located corpus to derive regional topic features (referred to as the regional corpus). Because of Twitter’s privacy policy, not all tweets can be geo-located, which leads to the reduction in size from the topic to regional corpus. However, as tweets are generally short, it is important to maximize the sample size used to develop the topic model, hence the two distinct corpora.

The topic corpus consists of every tweet that matches the keywords “carbon” (excluding “carbon monoxide”), “climate”, or “global warming” (and excludes “RT”, Twitter’s representation of a retweet) in the region of interest (US, in this case) over a seven day period. In this study, the date range is April 18th through April 25th, 2019—notably encompassing Earth Day and the Extinction Rebellion in London. The following is an example query to illustrate how data was collected, modified to enhance readability:

query = (global warming) or (climate) or (carbon)

excluding (monoxide) and (RT)

count = 100

tweet_mode = extended

max_id = [most recent tweet id]

where query is the specific keywords being targeted, excluding specifies words to exclude from the results, count identifies the number of tweets to return in the query (100 is the maximum), tweet_mode describes whether or not data beyond the tweet itself should be returned, and max_id specifies the most recent tweet ID to be used in the query (allowing us to exclude tweets that have already been collected, as tweet IDs are issued chronologically.

After data acquisition, the final topic corpus includes roughly 350,000 tweets. The regional corpus consists of geo-located tweets that match the aforementioned criteria. The most precise form of geo-tagging is a tweet that includes the latitude and longitude where the tweet originated, but less than 1% of tweets include this information. Another form of geo-tagging is a tweet that is associated with a specific location or a tweet that originates from a user associated with a specific location (e.g. Chicago, Illinois). For these tweets to be returned by the query, the entire region as specified by Twitter must be encompassed by the search radius. To perform geo-tagging, the search criteria is updated to include geographic coordinates and a search radius as follows:

geo-code = latitude, longitude, radius

where latitude and longitude specify the geographic center of a county and radius is the radius from the center in which to look for tweets. For the tweet to be returned by the query, it must be geo-tagged. The final regional corpus includes 190,000 tweets.

Note: The content above has been extracted from a research article, so it may not display correctly.



Q&A
Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.



We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.