Bioinformatic analysis

NA Nida Amin
SS Sarah Schwarzkopf
AK Asako Kinoshita
JT Johanna Tröscher-Mußotter
SD Sven Dänicke
AC Amélia Camarinha-Silva
KH Korinna Huber
JF Jana Frahm
JS Jana Seifert
ask Ask a question
Favorite

The bioinformatic analysis of Illumina amplicon sequencing datasets covering V1-V2 region of 16S rRNA gene was done using QIIME 2 (2019.10) [49]. The paired-end (PE) Illumina raw sequences (2 × 250 bp) were imported in QIIME 2 using MultiplexedPairedEndBarcodeInSequence semantic type. The PE sequences were demultiplexed using cutadapt (v2.6) within QIIME 2 with q2-cutadapt plugin and demux-paired command, increasing the default error tolerance to 0.2. The residual artificial sequences such as barcodes, forward primer (22 bp) and reverse primer (19 bp) were trimmed by implementing cutadapt (v2.6) in QIIME 2 with q2-cutadapt plugin and trim-paired command [50]. The quality filtration step and joining of PE reads was done by implementing DADA2 pipeline in QIIME 2 with q2-dada2 plugin and denoise-paired command [51]. The trimmed PE sequences were quality filtered by retaining high quality bases (average quality score above 30) and PE reads were joined at a mean length of 313 ± 6 bp, chimeric sequences, non-overlapping regions and singletons were discarded and FeatureTable [Frequency] and FeatureData [Sequence] QIIME 2 artifacts were generated. The PE sequences from each sequencing run were processed separately throughout the analysis resulting in FeatureTable [Frequency] and FeatureData [Sequence] QIIME 2 artifacts per sequencing run after DADA2 step. The filtered FeatureTable [Frequency] artifacts were merged with qiime feature-table merge command and FeatureData [Sequence] artifacts with qiime feature-table merge-seqs command resulting in a total of 6,141,120 reads, with 23,262 ± 1758 reads (mean ± SEM) per sample. Taxonomic classification was performed with q2-feature-classifier plugin and classify-sklearn method using sklearn-based taxonomy classifier (pre-trained on SILVA reference database for 16S rRNA (release_132), under a default confidence of 0.7 [52, 53]. Sequences assigned to cyanobacteria and chloroplast as well as non-bacterial and unassigned sequences from FeatureData [Sequence] and FeatureTable [Frequency] artifacts were removed using q2-taxa plugin in QIIME 2 and a taxonomy-based filtering step using qiime taxa filter-seqs and qiime taxa filter-table commands. All low reads samples (< 5000 reads) were removed from FeatureTable [Frequency] and FeatureData [Sequence] artifacts with qiime feature-table filter-samples and qiime feature-table filter-seqs commands. A biom feature table (FeatureTable [Frequency]-with-taxonomy annotations) was produced with biom add-metadata command in QIIME 2 that was later converted into txt format with biom convert command. The feature table was filtered again by following strict criteria to remove the low abundance OTUs (≤ 0.2% of total reads per sample), thus, resulting in a total of 4,741,355 reads, with mean read counts for stomach tubing samples 17,716 ± 1590 and for buccal swab samples 21,014 ± 2014 (mean ± SEM) per sample and a total of 4906 unique bacterial OTUs. All unique bacterial OTUs were taxonomically reassigned using RDP database [54] and naïve Bayesian RDP classifier [55]. The output taxonomy table was filtered according to [56] with a defined confidence threshold cut-off value for each taxonomic level such as: genus (94.5%), family (86.5%), order (82.0%), class (78.5%) and phylum (75.0%) and the taxonomic assignments were omitted if they fall below the following sequence identity thresholds.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A