Bioinformatic analysis

JK Jennifer Kelly
MA Miran Al-Rammahi
KD Kristian Daly
PF Paul K. Flanagan
AU Arun Urs
MC Marta C. Cohen
GS Gabriella di Stefano
MB Marcel J. C. Bijvelds
DS David N. Sheppard
HJ Hugo R. de Jonge
US Ursula E. Seidler
SS Soraya P. Shirazi-Beechey
request Request a Protocol
ask Ask a question
Favorite

Raw sequencing reads underwent a strict filtering pipeline to remove low-quality reads. The CGR employed a standard read-filtering pipeline on all sequenced datasets which comprised: i) the removal of Illumina adaptor sequences using CutAdapt83 (version 1.2.1); ii) the trimming of low-quality bases using Sickle (https://github.com/najoshi/sickle) (version 1.2), which utilises a sliding window of a defined size to remove read segments which do not have a minimum phred quality value of 20 and iii) the removal of any trimmed reads below 10 bp in length. High-quality paired-end reads were then assembled into overlapping sequences using the assembly software FLASH, based on the following parameters: minimum overlap: 25, maximum overlap: 250, maximum ratio between number of mismatches and overlap length: 0.2584. Only assembled sequences above 200 bp in length were retained. Assembled sequences were then filtered for any contaminating phiX sequence carried over from sequencing using BMtagger and the NCBI reference sequence for Enterobacteria phage phiX174 (NCBI accession NC 001,422)85.

Filtered de-multiplexed reads were analysed using the Quantitative Insights into Microbial Ecology 2 (QIIME2) software package (version qiime2-2021.2, https://qiime2.org)86. QIIME2’s “DADA2” plugin was used to resolve reads to high-resolution amplicon sequence variants (ASVs), which represent, as closely as possible, the original biological sequence of the sequenced amplicon87. Briefly, DADA2 works by constructing an error model specific to this dataset by training on the whole sequencing run, and then uses this model to correct all sequencing errors in the data and subsequently generate ASVs. The “DADA2” plugin also performs phiX and chimera removal. Following resolution of ASVs, multiple sequence alignment of ASV representative sequences was carried out using MAFFT software, followed by masking of highly variable positions using the QIIME2 alignment plugin. FastTree software was then used to infer unrooted and subsequently rooted maximum-likelihood phylogenetic trees representing the phylogenetic relatedness of ASVs (QIIME2 phylogeny plugin). ASVs were taxonomically classified using a downloaded Naïve-Bayes classifier pre-trained on Greengenes 13_8 99% operational taxonomic units (OTUs) trimmed to include only the 250-bp V4 region bound by the 515F/806R primers utilized in this study (QIIME2 feature-classifier plugin). Following taxonomic classification, ASVs comprising < 10 reads, found in only one sample, or classified as Mitochondria or Chloroplast were removed.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A