Statistics and reproducibility

HH Hilma Holm
EI Erna V. Ivarsdottir
TO Thorhildur Olafsdottir
RT Rosa Thorolfsdottir
EE Elias Eythorsson
KN Kristjan Norland
RG Rosa Gisladottir
GJ Gudrun Jonsdottir
UU Unnur Unnsteinsdottir
KS Kristin E. Sveinsdottir
BJ Benedikt A. Jonsson
MA Margret Andresdottir
DA David O. Arnar
AA Asgeir O. Arnthorsson
KB Kolbrún Birgisdottir
KB Kristbjorg Bjarnadottir
SB Solveig Bjarnadottir
GB Gyda Bjornsdottir
GE Gudmundur Einarsson
BE Berglind Eiriksdottir
EG Elisabet Eir Gardarsdottir
TG Thorarinn Gislason
MG Magnus Gottfredsson
SG Steinunn Gudmundsdottir
JG Julius Gudmundsson
KG Kristbjorg Gunnarsdottir
AH Anna Helgadottir
DH Dadi Helgason
IH Ingibjorg Hinriksdottir
RI Ragnar F. Ingvarsson
SJ Sigga S. Jonasdottir
IJ Ingileif Jonsdottir
TK Tekla H. Karlsdottir
AK Anna M. Kristinsdottir
SK Sigurdur Yngvi Kristinsson
SK Steinunn Kristjansdottir
TL Thorvardur J. Love
DL Dora Ludviksdottir
GM Gisli Masson
GN Gudmundur Norddahl
TO Thorunn Olafsdottir
IO Isleifur Olafsson
TR Thorunn Rafnar
HR Hrafnhildur L. Runolfsdottir
JS Jona Saemundsdottir
SS Svanur Sigurbjornsson
KS Kristin Sigurdardottir
ES Engilbert Sigurdsson
MS Martin I. Sigurdsson
ES Emil L. Sigurdsson
ask Ask a question
Favorite

With the C19Q we assessed both the presence and frequency of symptoms. For ease of interpretability, we treated the answers as binary traits for logistic regression (in general, absence/very infrequent symptom vs other), presenting ORs. As a robustness check, we also analyzed the answers as quantitative traits, yielding results similar to the results using logistic regression.

We tested all measures for association with SARS-CoV-2 infection adjusting for age and sex. Adjustment for comorbidities (obesity, hypertension, asthma, type 2 diabetes, cancer, and coronary artery disease) had minimal effect on associations and thus we report unadjusted results but show both in Supplementary Data. We applied two complementary study designs, allowing for mindful consideration of the trade-off between statistical power vs. screening for confounding effects when testing for association of SARS-CoV-2 and the numerous health-related traits. (A) We compared outcome measures of cases and all available control data. To account for time effects in (A), we tested for difference in measures between (i) cases and contemporary controls (restricting data to measures during the pandemic) and between ii) contemporary and historical controls where divergence from a null finding could indicate a time effect in A). We further plotted the data on physiological and blood traits against time of measure to explore batch effects. We observed batch effects for the hearing test, oxygen saturation, grip strength, and blood tests and for those traits measured data for controls was restricted to using more recent measures for historical controls (measures after 2017 for grip strength and after 2019 for hearing) or using only contemporary controls (oxygen saturation and blood tests, Supplementary Figure 4, Supplementary Methods). (B) We exploited a subset of the data that allows for a controlled before-and-after study, i.e., longitudinal measures for the same individuals collected before and during the pandemic (before and after infection for cases, with similar time duration between repeat measures for controls) providing the added benefit of accounting simultaneously for time effect and time-invariant individual heterogeneity, while acknowledging reduced power.

When testing for association between SARS-CoV-2 infection and other phenotypes, logistic regression was performed for binary traits, and linear regression was performed for quantitative traits. The physiological traits were regressed against SARS-CoV-2 status (1 for SARS-CoV-2 cases, 0 for controls), adjusting for age at time of measure and sex. The cognitive traits were similarly regressed against SARS-CoV-2 status, adjusting for age at time of measure, sex, and level of education obtained from the online questionnaire. Level of education was defined as a quantitative variable ranging from zero to six in the following manner: zero for no education, one for primary school, two for high school, three for other secondary education, four for an undergraduate university degree, five for Master’s degree and six for Doctorate degree. When testing for association between phenotypes and severity of the acute infection, the traits were regressed against the severity scale. P values for all regression analyses were obtained with a likelihood ratio test. Individuals with missing data were not used in association analyses. To test for sensitivity of the results to comorbidity, we tested for association between phenotypes and SARS-CoV-2 as described above with indicator variables added for obesity, coronary artery disease, type II diabetes, asthma, hypertension, and cancer, restricting the data to non-missing observations in the comorbidity indicators. When comparing measurements for individuals that participated in the dHS both before and during the pandemic, we subtracted pre-pandemic measures from those taken during the pandemic for both cases and controls and regressed the difference against SARS-CoV-2 status, adjusting for age, sex, and time between measures using linear regression.

For two of the cognitive traits, TMT-A and TMT-B, we log-transformed the test scores because of right-skewness in their distributions. Test scores for each of the 12 cognitive traits were then adjusted for age, gender, and education in a linear model prior to rank-based inverse normal transformation. The normalized and adjusted cognitive test scores were then used in association analyses to report effect estimates in SD units.

We calculated the frequency of memory impairment, defined as z-score of less than or equal to −1.5, among cases and controls for logical memory measures, adjusting for age, gender, and educational level. Since cases and controls are not perfectly matched on those three covariates, we estimated parameters for those covariates using the control data as training set, applying the corresponding, measure-specific, parameters to create z-scores for cases and controls.

To establish association with SARS-CoV-2 infection, we required (1) association when comparing cases with all available controls using the following thresholds accounting for multiple testing: P = 0.05/96 = 5 × 10−4 for health and symptom questionnaire data, P = 0.05/87 = 6 × 10−4 for physiological measures and cognitive tests, and P = 0.05/63 = 8 × 10−4 for blood tests, and (2) at least one of the following, as statistical power varies across measures: (i) consistent association results (same direction and non-heterogeneity in the effect estimates) with (1) when comparing cases with contemporary controls, (ii) consistent association results with (1) when comparing the subset of cases to the subset of controls with longitudinal measures. For symptoms and objective measures that associated with prior SARS-CoV-2 infection we required a P < 0.05 to establish a correlation with time from infection and the severity of the infection. For measures that did not associate with prior SARS-CoV-2 infection per se, we required the same multiple testing thresholds as listed above to establish an association with the severity of the infection. We used multiple linear or logistic regression for association testing. Analyses were performed in R, version 3.6.0.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A