The raw sequence data were processed by the DADA2 1.14 in R 3.6.3 [119] to infer amplicon sequence variants (ASVs) [120]. Specifically, the demultiplexed paired-ended reads were trimmed off the primer sequences (forward reads, first 20 bps; reverse reads, first 18 bps), truncated at the position where the median Phred quality score crashed (forward reads, at position 290 bp; reverse reads, at position 248 bp) and filtered off low-quality reads. After trimming and filtering, the run-specific error rates were estimated and the ASVs were inferred by pooling reads from all the samples sequenced in the same run. The chimeras were removed using the “pooled” method after merging the reads. The resulting raw ASV table and representative sequences were imported into QIIME2 (version, 2020.2) [121]. The taxonomy was assigned by a scikit-learn naive Bayes machine-learning classifier [122], which was trained on the SILVA 132 99% OTUs [123] that were trimmed to only include the regions of 16S rRNA gene amplified by our primers. ASVs identified as chloroplasts or mitochondria were excluded from the ASV table. The ASV table was conservatively filtered to remove ASVs that had no phylum-level taxonomic assignment or appeared in only one biological sample. Contaminating ASVs were identified based on two suggested criteria: contaminants are often found in negative controls and inversely correlate with sample DNA concentration [98]. The ASVs filtered from the raw ASV table were also removed from the representative sequences, which were then inserted into a reference phylogenetic tree built on the SILVA 128 database using SEPP [124]. The alpha rarefaction curves and the core metrics results were generated with a sampling depth of 10,000 and 2047 sequences per sample, respectively (Fig. S9). For downstream data analysis and visualization, QIIME2 artifacts were imported into R using the qiime2R package [125] and a phyloseq [126] object was assembled from the sample metadata, ASV table, taxonomy and phylogenetic tree. The core ASVs were calculated using a prevalence threshold at 80% and visualized by the Venn’s diagram. The alpha-diversity indices, including observed ASVs, Pielou’s evenness, Shannon’s index and Faith’s phylogenetic diversity (PD), were computed via the R packages microbiome [127] and picante [128]. For beta-diversity analyses, we used distance matrices including Jaccard distance, unweighted UniFrac distance, Aitchison distance and phylogenetic isometric log-ratio (PHILR) transformed Euclidean distance. Since rarefying remains to be the best solution for unweighted distance matrices [129], the Jaccard distance and unweighted UniFrac distance were computed in QIIME2 using the rarefied ASV table. The compositionality-aware distance matrices, Aitchison distance and PHILR transformed Euclidean distance, were calculated using the unrarefied ASV table. The Aitchison distance was computed by the DEICODE plugin in QIIME2, a form of Aitchison distance that is robust to high levels of sparsity by using the matrix completion to handle the excessive zeros in the microbiome data [130]. The PHILR transform of the ASV table was performed in R using the philr package [131]. The selected distance matrices were explored and visualized by the principal coordinates analysis (PCoA).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.