Sequence analysis and statistics

IE Ilze Elbere
IS Ivars Silamikelis
ID Ilze Izabella Dindune
IK Ineta Kalnina
MU Monta Ustinova
LZ Linda Zaharenko
LS Laila Silamikele
VR Vita Rovite
DG Dita Gudra
IK Ilze Konrade
JS Jelizaveta Sokolovska
VP Valdis Pirags
JK Janis Klovins
request Request a Protocol
ask Ask a question
Favorite

Raw data from the sequencer were processed as follows: adapters were removed with cutadapt 1.16, sequences were trimmed with Trimmomatic v0.38 (5bp window, quality threshold = 20, average quality = 20, minimal length = 75), mapping was performed with bowtie2-2.3.5.1 using Homo sapiens genome Ensembl GRCh38 release-90 reference to remove host DNA sequences. Information on read numbers during sequence preprocessing has been summarized in S3 Table.

Composition and functionality from the remaining sequences of gut microbiome samples were analyzed using the HUMAnN2 pipeline [19], and taxonomic data were obtained with MetaPhlAn2 [20], analyses were performed with default parameters. Species level alpha diversity was calculated as the exponential of the Shannon index resulting in the effective number of species, and beta diversity was analyzed with non-metric multidimensional scaling (NMDS) using Bray-Curtis distances. Results of beta diversity were compared between subgroups with permutational multivariate analysis of variance—PERMANOVA. To explain the effects of environmental variables, adonis function (vegan package) was used to test the significance of individual variables, and complemented with Canonical Correspondence Analysis (CCA) and visualized with biplot using R software (version 3.6.0) [21]. Evaluation of variables of interest was performed in two cases: (1) for all samples–both groups, baseline and follow up–to evaluate the contribution of age, gender and BMI; (2) only for T2D patient samples–to evaluate possible effect of the different prescribed metformin doses. Changes during metformin therapy and differences between study subgroups within the taxonomic and functional profiles were evaluated by R package limma using voom transformation with sample-specific quality weights (further referred as limma+voom). All tests were adjusted by age, gender, and BMI, false discovery rate (FDR) adjusted values were used. T2D group data were adjusted by baseline HbA1c levels. Only taxa present in ≥10% of samples were included. To compare metformin therapy response groups, the corrected data matrix was used for sparse Partial least squares discriminant analysis (sPLS-DA), a supervised model to reveal microbiota variation between groups. Key taxonomic groups responsible for the differential microbiota structure were detected using the “splsda” function in the R package “mix Omics” [22], tuning of sPLS-DA parameters was performed to determine the main taxonomic groups that enable discrimination of the subgroups with the lowest possible error rate. Taxonomic groups with variable importance in projection (VIP) > 1.5 were considered to be important contributors to the model. Additional cellular function enrichment analysis and visualization of functional profile data were performed using the Omics Dashboard integrated into MetaCyc. The dashboard computes enrichment p-values using Grossmann's parent-child-union variation of the Fisher-exact test (applying the FDR multiple hypothesis correction) and then transforms each p-value to an enrichment score: -log10 (p-value). Significance threshold <0.05 [23]. Statistical significance for changes/differences of the Shannon index and other analyzed parameters was evaluated by the Wilcoxon signed-rank test. Data normalizations were performed as integrated into the used tools, paired comparisons were used when appropriate.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A