Quantification and statistical analysis

RK Risto Conte Keivabu
TW Tobias Widmann
ask Ask a question
Favorite

In this article we capture a plausibly causal effect of temperature on language complexity using an estimation strategy that has been applied in similar studies.41,42 More precisely, the estimation strategy is exposed in Equation 1:

In the equation, Yipct denotes the outcome, Flesh-Kincaid score, measured for speech i of politician p, in city c at date t. The temperature exposure is captured by TEMP composed by the categories j < 0°C; 0°C to 3°C; 3°C to 6°C; 6°C to 9°C; 9°C to 12°C; 12°C to 18°C (comfort zone); 18 to 21°C; 21 to 24°C; 24 to 27°C; >27°C measured at city c and date t. As explained in the results section, we used politician level fixed effects, μp to isolate the effect of temperature variations on the speeches of individual politicians. The month-by-city fixed effects δmc are used to account for seasonal patterns and the day of week fixed effects υd are used to rule out that our results are biased by possible weekly differences in speech quality. Also, we added a vector X for our control variables wind speed, relative humidity and precipitation. As in similar previous studies,43 we cluster standard errors at the month-by-city level, as this better captures the level of our treatment and allows us to account for potential correlation of regression errors that could arise within the same city across different months and account for underestimation of the variability in our estimates that could result from weather patterns or local events affecting multiple observations within the same city during the same month. Additionally, we tested robustness to alternative clustering at the politician level. Results with such alternative clustering strategies show to be consistent (Table S3).

For our marginal effects analysis for Germany, we use the same approach of our main analysis but add an interaction between our temperature measures and the gender and age categories. The estimation strategy is shown in Equation 2:

Here, the temperature categories are interacted with the demographic variables DEMO that for Figure 2 represent the age categories and for Figure 3 is the gender of the politician p. As written in the results section, the age categories are four and based on quartiles of the age distribution of politicians in our sample that we described in the results section. Here, we cluster standard errors at the politician level as done in previous studies,13,42 due to limited numbers of clusters at the city level (we have observations on only Bonn first and only Berlin from 1999) and the problems raised by using month as a cluster due to treatment allocations in one month possibly similar to adjacent months. Nevertheless, we acknowledge the limitations of such approach that could possibly determine Type 1 errors due to spatial autocorrelation and reduce standard errors. The remaining of the model resembles Equation 1 as we added month-by-city FE, day of the week FE and politician FE. Nevertheless, due to the milder climate of Germany, our temperature categories are only 9: < 0°C; 0°C to 3°C; 3°C to 6°C; 6°C to 9°C; 9 to 12°C;12°C to 18°C (comfort zone);18 to 21°C; 21 to 24°C; >24°C.

Our results are robust to a set of sensitivity analysis. Firstly, as stated in the results sections, we used different sets of model specifications (Table S3). More precisely, we tested the introduction of a linear and cubic time trend constructed using a continuous value for the week of the speech, a continuous month-by-city FE, a less restrictive model specification where we used year-by-city FE. The results show to hold in all models, but we observe changes in the effect size and significance of the results that is smaller in a more conservative modelling approach compared to a less restrictive model. Such results are expected as the variation left in a more restrictive model is lower. Nevertheless, despite these analyses, we acknowledge that we are not able to fully capture week-of-study specific time trends that could still partly bias our results. Secondly, we used additional operationalizations of language complexity. Instead of the Flesch-Kincaid score, as mentioned previously, we used the Flesch and Rix score (Figure S6). Also, we used the temperature on the three days before and after the speech to test for lagged effects. In Figure S7, we observe an effect of hot days only on the day of the speech and not on days before or after, which corroborates our findings. Finally, to investigate whether changes in language complexity are not driven by a change in the topics discussed, we replicated the German analysis while controlling for speech topic. To measure topics in parliamentary debates, we ran a structural topic model.72 Topic modeling, a subset of machine learning and natural language processing (NLP), provides a way of distilling large volumes of text into smaller topic groups. This method operates under the assumption that documents are mixtures of topics, where a topic is defined by a probability distribution over words. Thus, it identifies clusters of words that frequently appear together across the corpus and interprets these clusters as indicative of different "topics." To do so, it relies on an iterative process, which relies on two key distributions: the probability of topics given a document (document-topic distribution) and the probability of words given a topic (topic-word distribution). At the start, words are randomly assigned to topics, but as the algorithm iterates, it refines these assignments based on the context provided by the document-topic and topic-word distributions. The goal is to maximize the likelihood that the observed words in the documents are generated by these distributions. Through this iterative refinement, words that frequently co-occur across the corpus naturally cluster into topics, while documents that share similar words strengthen their associations with these topics.

For our specific application, we did not predetermine the number of topics (k), which is a common requirement in many topic modeling applications. Instead, we adopted the approach by Lee and Mimno (2017) which employs spectral initialization techniques.72 This method stands out by using spectral decomposition on the document-feature matrix to infer the latent topic structure without prior specification of the number of topics. This approach circumvents the often subjective and iterative process of selecting an appropriate k value, thereby enhancing the objectivity and reproducibility of the topic modeling process. Upon applying this method, we identified 107 distinct topics within the parliamentary debate texts. This reveals the wide range of subjects covered in these debates, underscoring the complexity of legislative discourse. After the topic model was applied, we needed to relate the topics back to individual speeches to understand the main focus of each speech. To accomplish this, we used the highest θ value for each document, which represents the probability of each topic given the document. Assigning topics based on the highest θ value ensures that each speech is associated with the topic it most strongly relates to. Replicating the analysis including the categorical topic variable, we observe a slightly smaller effect size compared to the model without such control (Table S9). Nevertheless, the coefficient remains negative and statistically significant substantiating our main results.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A