V3–V4 16S rRNA gene sequencing and data pre-processing

NM Núria Mach
LL Léa Lansade
DB David Bars-Cortina
SD Sophie Dhorne-Pollet
AF Aline Foury
MM Marie-Pierre Moisan
AR Alice Ruet
request Request a Protocol
ask Ask a question
Favorite

The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for QIIME 2 (version 2019.10) to perform quality filtering and chimera removal and to construct a feature table consisting of read abundance per ASV by sample62. DADA2 models the amplicon sequencing error in order to identify unique ASV and to infer sample composition more accurately than traditional Operational Taxonomic Unit (OTU) picking methods that identify representative sequences from clusters of sequences based on a % similarity cut-off62. The output of DADA2 was an abundance table, in which each unique sequence was characterized by its abundance in each sample. Taxonomic assignments were given to ASVs by importing SILVA 16S representative sequences and consensus taxonomy (release 132, 99% of identity) to QIIME 2 and classifying representative ASVs using the naive Bayes classifier plug-in63. The feature table, taxonomy and phylogenetic tree were then exported from QIIME 2 to the R statistical environment and combined into a phyloseq object64. Prevalence filtering was applied to remove ASVs with less than 1% prevalence and in less than three individuals, decreasing the possibility of data artifacts affecting the analysis62. To reduce the effects of uncertainty in ASV taxonomic classification, we conducted most of our analysis at the microbial genus level.

The phyloseq65, vegan66 and microbiome R packages were used for the detailed downstream analysis. The minimum sampling depth in our data set was 10,423 reads per sample. Before the estimation of diversity indexes, samples were rarefied at 10,000 reads of depth, to allow an equal depth using the rarefy_even_depth function in the phyloseq R package, which is implemented as an ad hoc means to normalize microbiome counts that have resulted from libraries of widely differing size. The minimal sequencing depth of 10,000 was sufficient for accurately profiling bacterial composition, as predicted by the calculation of the rarefaction curve for observed richness and Shannon index (which accounts for both abundance and evenness).

ASV counts per sample and ASV taxonomical assignments are available in supplementary Table S4. Data was aggregated at genus, family, order, class and phyla levels throughout the taxonomic-agglomeration method in phyloseq R package, which merges taxa of the same taxonomic category for a user-specific taxonomic level.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A