Data Analysis

DK Darlene A. Kertes
CC Cherita Clendinen
KD Ke Duan
JR Jill A. Rabinowitz
CB Christopher Browning
PK Peter Kvam
request Request a Protocol
ask Ask a question
Favorite

Statistical analyses were conducted using R version 4.0.5. To accommodate missing item-level data, multiple imputation was conducted prior to statistical modeling using the “mice” package (van Buuren & Groothuis-Oudshoorn, 2011). Polytomous logistic regression was used for categorical data and predictive mean matching for continuous variables (Little, 1988; Venables & Ripley, 2002). All statistical models using overall measures/sum scores were carried out using Bayesian analyses, as described below. Because the statistical conclusions were drawn based on differences between the posterior (estimates after considering the data) and the prior (estimates before considering the data), using the prior allowed us to ensure that missing measure-level data had no effect on the conclusions, and an optimally minimal effect on the estimates of coefficients. This allowed us to use the prior to model a distribution of possible values for missing data at the scale level, rather than requiring imputation.

To determine the association of the predictors with the outcomes, a Bayesian regression model was implemented using JAGS [Just Another Gibbs Sampler] (Plummer, 2004; Kruschke, 2014) in MATLAB and R. Bayesian methods offer a variety of advantages over traditional regression approaches, including the ability to quantify uncertainty in terms of probability distributions over competing models (e.g., models with vs. without telomere length as a predictor of anxiety) and parameters (e.g., value of the coefficient predicting anxiety from telomere length) that cannot be obtained from classical statistical approaches. Specifically, they estimate a “posterior” distribution, I.e., a probability distribution that specifies the relative likelihoods of all possible parameter values (coefficients in the regression) given the data. The sampling procedure (JAGS) draws randomly from the posterior distribution proportional to the posterior likelihoods of different parameter values such that more likely values (e.g., those near zero, for null effects) will result in more samples near that value, while less likely values (those that reject the null) will result in fewer samples. Each chain corresponds to one sequence of samples from the posterior; when chains converge, it indicates that they all agree on the shape of the posterior. In simple terms, to test the likelihood of a particular coefficient being zero, the frequency of posterior samples near zero (indicating a null effect) relative to samples far from zero (indicating a large effect) were evaluated. Bayesian methods like these have now been used in thousands of psychology articles and have consistently been shown to generate more valid and reliable conclusions than classical regression/null hypothesis testing (van de Schoot et al., 2017). By using the posterior, these approaches quantify the likelihood of the null and various alternative hypotheses as opposed to quantifying the likelihood of the data assuming the null hypothesis is true (Wagenmakers, 2007; Wagenmakers et al., 2018), leading to both more intuitive conclusions and more flexible data analyses.

The Bayesian regression included each of the predictors outlined in Table Table22 along with the interaction of each of these variables with T/S ratio were included. The effects of the set of predictors for anxiety and depressive symptoms were estimated in separate models. To estimate coefficients in the regression model and compare competing hypotheses about model structure, the posterior distribution was estimated across all parameters using a Gibbs Sampler (Plummer, 2004), with 4 chains of 5000 samples each. All chains reached convergence (r-hat statistic <1.005; Gelman & Shirley, 2011). This provided 20,000 random samples from the posterior distribution for each parameter of the model. From these samples, a 95% highest density interval [HDI] was computed specifying the 95% most likely values of each coefficient/ parameter in the model. This HDI has the benefit of being more intuitive than classical regression in which the confidence interval must strictly be interpreted as the range of values that one might expect the mean of a sample to fall within 95% of the time if an experiment were repeated many times. From the posterior samples, a Bayes factor was computed for the inclusion of each coefficient in the model by comparing the height of the prior distribution against the height of the posterior distribution at b = 0, referred to as a [generalized] Savage-Dickey Bayes factor (Wagenmakers et al., 2010; Heck, 2019). The estimation procedure used a standard normal distribution as the prior for each coefficient to ensure that this comparison reflected optimal inferences based on the data (Consonni et al., 2018). The Bayes factor quantifies the odds of a coefficient being nonzero, given the data available. These odds can then be transformed into a probability of inclusion “Pr(incl)” where larger values indicate that a parameter is more likely to be nonzero. Values of Pr(incl) below.5 indicate support for the null/not including a particular parameter in the model, whereas values above 0.5 indicate support for inclusion/non-null values of a parameter. Both the 95% HDI for each of the estimated parameters as well as their probability of inclusion based on the Bayes factor are reported. A credible effect in Bayesian regression is one where Pr(incl) >0.5 and HDI does not cross zero. The effects of age, sex, and season of data collection were controlled for by assigning them a prior likelihood of 1 (i.e., must be included as factors in the model). This was done to compensate for the effects of demographic and seasonal influences demonstrated in prior work in our cohort and others to covary with telomere length (Kertes et al., 2022; Ly et al., 2019).

Descriptive statistics

In reporting the results, common Bayesian analysis reporting guidelines (e.g., Krushke, 2014) were followed or facilitated. Sensitivity analyses for different priors can be carried out by using the priors and computations contained in the analysis code, provided at https://github.com/KertesLab/manuscript_code_repository.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A