Microbiome 16S rRNA sequencing statistical analysis

CK C. Keating
MB M. Bolton-Warberg
JH J. Hinchcliffe
RD R. Davies
SW S. Whelan
AW A. H. L. Wan
RF R. D. Fitzgerald
SD S. J. Davies
UI U. Z. Ijaz
CS C. J. Smith
ask Ask a question
Favorite

The microbial community of the hindgut of samples across treatments was characterised to assess whether treatment impacted the microbial composition of the gut. Samples were grouped into defining categories bases on treatment and/or time (Week 0, CTRL_8, ULVA_8, ASCO_N_8, ASCO_LG_8, CTRL_12, ULVA_12, ASCO_N_12, ASCO_LG_12). These groupings were then used to investigate changes in the microbiota with respect to treatment and time. Microbiome statistical analyses were carried out via the software R (v 3.4.2 [118]) with the OTU biom file, phylogenetic tree and the associated metadata files for the study. Spurious OTUs (not a bacterial or archaeal species) and OTUs found in the negative control were removed from all samples before statistical analyses.

were carried out using R’s ‘Vegan’ package [119]. The following alpha diversity measures were then calculated - Pielou’s evenness (how equal in number the relative OTUs are); Richness (the number of unique OTUs) and Shannon’s index of entropy (the degree of uncertainty within a grouping assuming that a high degree of uncertainty corresponds to a highly diverse sample [120]). These were calculated on rarefied microbiome data (Supplementary Figure S3). Pair-wise ANOVA P-values were drawn on top of alpha diversity figures. Beta-diversity measures were visualised using non-metric multidimensional scaling (NMDS) plots using standard dissimilarity distance measures: Bray-Curtis and Unifrac (Unweighted and Weighted). Unifrac distances were calculated using the R package ‘Phyloseq’ [121]. Samples were grouped for different treatments as well as the mean ordination value and spread of points (ellipses were drawn using Vegan’s ordiellipse() function that represents the 95% confidence interval of the standard errors). R’s ‘Vegan’ package and ordisurf() function was used to determine if the main pattern in species composition described in principal co-ordinate analysis (PCOA) could be explained by the variable of fish condition factor (K). Ordisurf() internally fits a generalised additive model “gam” smoothing with formula K ~ s (PCoA1, PCoA2).

In order to interpret this complex high-throughput dataset we used R’s mixOmics package [122] and Sparse Projection to Latent Structure – Discriminant Analysis (sPLS-DA) to select the most important and distinguishing OTUs between the treatments and over time. Analysis was performed at genus level with genera > 3.5% abundance and with TSS + CLR (Total Sum Scaling followed by Centralised Log Ratio) normalisation applied. OTU feature selection and multiple data integration was achieved by constructing artificial latent components (inferred variables) of the OTU table by first factorizing the abundance table and response variable (that contains categorical information of samples) matrices into scores and loading vectors in a new space such that the covariance between the scores of these matrices are maximised. In doing so, the algorithm also enforces a constraint on the loading vector associated with the abundance table (that has corresponding weight for each feature) such that some components of the loading vector go to zero and only significant features in terms of their discriminatory power remain. In this study, sPLS-DA was applied to determine the effect of ‘Treatment’ and ‘Time’. ‘Time’ contained Week 0 and all treatments except the ASCO_LG fish (i.e. CTRL, ULVA, ASCO_N) and longitudinal information (Week 0, Week 8, Week 12). ‘Treatment’ examines the independent treatments including all groups (CTRL, ULVA, ASCO_N and ASCO_LG) at either Week 8 or Week 12. The number of latent components and the number of discriminants were calculated. In finding the number of discriminants the model was fine-tuned using leave-one-out cross-validation and subsequent balanced error rates which accounts for sample number differences (fine-tuning parameters indicated in figure legends for Fig. Fig.66 and Figure S5). The final output was a heatmap containing the differential genera over the number of tuned components which is discussed in the Results section. Further description of the implementation of sPLS-DA for microbial data-sets can be found in Gauchotte-Lindsay et al. [123]. The OTUs identified in sPLS-DA were subsequently used in differential tree analysis to show the taxonomic shifts in microbial community structure occurring over time. The complete methodology is outlined in [124] and uses the DESeqDataSetFromMatric() function from the DESeq2 package [125] Metacoder package to generate the tree visualisations [126] in R.

We followed a method for identifying the core microbiome as outlined McKenna et al. [124], and Shetty et al. [127] using R’s microbiome package [128] adjustable parameters for percentage relative abundance and percentage prevalence within samples. For the samples in this study, the core microbiome was defined as prevalent in 85% of samples [129].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A