Amplicon data processing and community analysis

SS Sudarshan A. Shetty
BK Ben Kuipers
SA Siavash Atashgahi
SA Steven Aalvink
HS Hauke Smidt
WV Willem M. de Vos
request Request a Protocol
ask Ask a question
Favorite

The raw reads were processed in R using the DADA2 R package (v 1.20.0)75. After filtering the reads with low quality, removal of reads with more than 2 errors and those matching the PhiX (filterAndTrim function), and chimeric sequences (removeBimeraDenovo, consensus method), a total of 1,010,000 reads were obtained from 23 samples, including one mock community comprised of known 16S rRNA gene sequences of microorganisms occurring in the human gut, and two inoculum samples which contained each candidate strain mixed in equal cell densities76. The raw paired-end 16S rRNA gene amplicon sequences have been submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) under project accession number PRJEB36253.

The taxonomic assignment was done using a custom database containing the full-length 16S rRNA gene sequences for all of the ten strains used in the present study (database available at https://github.com/microsud/Db-MM-10)77. The classification was done using the RDP classifier78. Unclassified amplicon sequence variants (ASVs) were removed before further analyses. The ASVs identified by DADA2 were collapsed at species level and counts were corrected for the differences in 16S rRNA gene copy number identified in the individual strain genomes (Supplementary Table 2). This was done by dividing raw species level counts with copy number. The corrected counts were scaled to the total 16S rRNA gene copies per sample obtained from qPCR thus correcting for sequencing depth and obtaining quantitative microbiota profiles similar to a previous study based on cell-counts79. Due to the limited complexity and exact genome-based knowledge of 16S rRNA gene copy numbers for each strain (Supplementary Table 2), qPCR was preferred over fluorescence-activated cell sorting (FACS) which can lead to bias due to cell aggregation80. Further analysis of the community composition and structure was done using the microbiome R package (v 1.14.0)65. Data visualization was primarily done using ggplot2 (v 3.3.3) and ggpubr (v 0.4.0) R packages.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A