Statistical analyses

Melanie Bannister-Tyrrell; Set Srun; Vincent Sluydts; Charlotte Gryseels; Vanna Mean; Saorin Kim; Mao Sokny; Koen Peeters Grietens; Marc Coosemans; Didier Menard; Sochantha Tho; Wim Van Bortel; Lies Durnez

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Statistical analyses

MB Melanie Bannister-Tyrrell

SS Set Srun

VS Vincent Sluydts

CG Charlotte Gryseels

VM Vanna Mean

SK Saorin Kim

MS Mao Sokny

KG Koen Peeters Grietens

MC Marc Coosemans

DM Didier Menard

ST Sochantha Tho

WB Wim Van Bortel

LD Lies Durnez

This method is extracted from research article: Sci Rep, Aug 2018

Importance of household-level risk factors in explaining micro-epidemiology of asymptomatic malaria infections in Ratanakiri Province, Cambodia

DOI: 10.1038/s41598-018-30193-3

Request a Protocol

Ask a question

Favorite

Two-level logistic regression models with a random intercept fitted for each household were used to model the odds of individual Plasmodium infection. There were two objectives for the multilevel analysis; firstly, to identify risk factors for Plasmodium infection at individual and household level; secondly, to assess whether differences in odds of Plasmodium infection between households and between villages remained unexplained after adjustment for individual and household risk factors. To achieve these objectives, a step-wise modelling approach was used, as follows. First, a null model with household random intercepts only was fitted to calculate the crude household-level variation in odds of Plasmodium infection. Second, crude odds ratios and 95% confidence intervals were calculated for each individual-level exposure variable, and included in a combined multivariable model if p < 0.20. Manual backwards stepwise selection was used to retain variables significant at p < 0.05, with age retained as an a priori confounder. Village was then added as a fixed effect to determine if individual-level variables explained variation in odds of Plasmodium infection between villages. Household-level variables were similarly selected, and then added to the individual model, and the explained between-household variance was determined. Village was added to the final model as a fixed effect, and its significance assessed.

Challenges interpreting the results of multilevel logistic regression models include the interpretation of cluster-specific odds ratios, and making appropriate interpretations of the variance of the cluster-level effect within and between models. These were addressed as follows:

The covariate odds ratios in multilevel logistic regression models have a cluster-specific interpretation^⁴⁶. For individual-level characteristics, the interpretation is that the odds ratios reflect the change in odds of the outcome at each level of the exposure variable for individuals within the same cluster and conditioned on all other covariates. However for cluster-level effects, the interpretation is the change in odds of the outcome at each level of the exposure variable relative to the baseline category within the same cluster (and conditioned on covariates), which is problematic because cluster-level values are constant for all individuals within a cluster^⁴⁶. To address the difficulty in interpreting household-specific fixed effects, the approximate marginal (i.e. population average) effects were estimated by calculating and applying a shrinkage factor (equation 2 in^⁴⁶) to the household-level log-odds ratios and presented as supplementary to the main results.

The relative importance of individual-level compared to household-level effects in each model was assessed in two ways. First, the intra-class correlation coefficients (ICC) were calculated at each stage of model adjustment. The latent response formulation was used to estimate the between-subject variance on the log-odds scale, which assumes that underlying the observed binary outcome (Plasmodium infection) is a latent continuous variable for the propensity to be Plasmodium infected, which manifests as a binary variable once an unobserved threshold is reached^⁴⁶. This approach allows both the cluster-level and individual-level variance to be measured on the log-odds scale, with the between-individual variance following the logistic distribution and fixed at π²/3. Thus, the ICC, which expresses the proportion of total variance that is due to the household-level variance can be estimated. Likelihood ratio tests were used to assess the null hypothesis that the ICC was equal to zero; i.e., no evidence of household-level clustering. To complement the interpretation of the ICC, median odds ratios (MOR) were also calculated^⁴⁷. MORs are a function of the household-level variance only, and can be interpreted as the increased median odds of Plasmodium infection if an individual living in one household moved to a household with higher odds of Plasmodium infection, conditioned on covariates^⁴⁸. A MOR of 1 indicates no difference between households. As it is expressed as an odds ratio, the magnitude of the household-level random effect can be directly compared to the magnitude of the individual or household-level risk factor odds ratios (or the inverse of the odds ratio, for protective factors)^⁴⁸.

The proportion of explained variance at successive stages of model adjustment was calculated the difference between the variance of the random effect in each adjusted model and the variance of the random effect in the null model, expressed as a proportion of the variance of the random effect in the null model. However, as a consequence of the fixed between-individual variance in the latent response formulation, adjustment for individual-level covariates rescales the cluster-level variance, such that household-level variance may appear to increase^⁴⁶. Thus, a variance scaling factor (the total variance of the null model as a proportion of the total variance of the adjusted model) was applied to rescale the random variance of the adjusted models to that of the null model, which permits direct comparison at each stage of model adjustment^⁴⁶.

In the analyses, frequency variables were first fitted as categorical variables, and then as frequency-weighted continuous variables for comparison; the latter were preferred as long as effect estimates for covariates did not meaningfully shift (>10%) compared to fitting categorical variables. Total number of farm houses and total number of village houses were fitted as linear variables. Missing data were addressed by complete case analysis. P-values were calculated using likelihood ratio tests. Statistical analyses were conducted in Stata/IC v13.1 (StataCorp, Lakeway Dr., College Station, Texas); specifically, multilevel logistic regression models and ICCs were estimated using xtlogit, median odds ratios were calculated using xtrho, and marginal odds ratios and variance scaling factors were calculated directly^⁴⁶.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol