Bioinformatic and Statistical Analysis

MC Modupe O. Coker
HL Hannah E. Laue
AH Anne G. Hoen
MH Margaret Hilliard
ED Erika Dade
ZL Zhigang Li
TP Thomas Palys
HM Hilary G. Morrison
EB Emily Baker
MK Margaret R. Karagas
JM Juliette C. Madan
request Request a Protocol
ask Ask a question
Favorite

Fastq files containing 16S and WGS sequence reads were quality-filtered and trimmed with Kneaddata1 and Trimmomatic (Bolger et al., 2014), respectively. 16S rRNA gene sequence processing and analyses were done using R version 3.52. DADA2 sequence processing pipeline (v.1.6.0) (Callahan et al., 2016) was used to infer the amplicon sequence variants (ASVs) present and their relative abundances across samples. Thereafter, taxonomic assignment was performed on ASVs using the Greengenes classifier (DeSantis et al., 2006). We constructed a phylogenetic tree using the DECIPHER (version) and phangorn (v.2.4.0), R packages. All downstream analyses were stratified by delivery mode and based on the eight groups previously described for 6-week and 1-year samples (four groups each).

Within phyloseq (version) (McMurdie and Holmes, 2013), ASV abundances were used to calculate alpha diversity indices. To determine statistical significance of the difference in alpha diversity indices between groups, F-tests, Student’s t-tests and multivariable linear regression analyses were performed (adjusted for gestational age, antibiotic use and solid food introduction). Using a midpoint-rooted phylogenetic tree generated from phangorn (v.2.4.0), overall community differences between samples (beta diversity) were tested within vegan (v.2.4.6) by permutational multivariate analysis of variance (PERMANOVA) of pairwise generalized UniFrac distance matrices, with 1000 permutations.

To identify significant associations between metadata and transformed microbial taxonomic or functional abundances, we applied two multivariate regression models that allow for mixed effects (Multivariate microbial Association by Linear models, MaAsLin (Morgan et al., 2012) and Inference for Absolute Abundance, IFAA (Li et al., 2021)). MaAsLin utilizes robust additive general linear models on relative abundances and IFAA employs robust estimating equations for parameter estimation of differential absolute abundance. Both methods were able to establish associations and identify differentially abundant taxa while adjusting for multiple time points and other confounders, MaAsLin results imply direction of associations (coefficients and q values) with respect to relative abundance while IFAA results inferred magnitude of change in absolute abundance (estimates and 95% confidence intervals). Additional information regarding these methods can be found in Supplementary Methods.

Shotgun metagenomic reads were input into the HUMAnN2 (HMP Unified Metabolic Analysis Network) suite of tools under default parameters (Franzosa et al., 2018). MetaPhlAn2 was used to extract taxonomical profiles while functional pathways were assigned to reads based on the chocophlan databases and genes based on UniRef90 (Suzek et al., 2015). The HUMAnN2 gene abundance table was regrouped and mapped based on MetaCyc database (Caspi et al., 2018). Similar to 16S sequencing data and using the same parameters, the resulting taxonomic and pathway abundance tables from HUMAnN2 were analyzed with MaAsLin and IFAA to determine significant features associated with the comparison groups within a multivariable model.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A