Sequence data processing

Yanxian Li; Leonardo Bruni; Alexander Jaramillo-Torres; Karina Gajardo; Trond M. Kortner; Åshild Krogdahl

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Sequence data processing

YL Yanxian Li

LB Leonardo Bruni

AJ Alexander Jaramillo-Torres

KG Karina Gajardo

TK Trond M. Kortner

ÅK Åshild Krogdahl

This method is extracted from research article: Anim Microbiome, Jan 2021

Differential response of digesta- and mucosa-associated intestinal microbiota to dietary insect meal during the seawater phase of Atlantic salmon

DOI: 10.1186/s42523-020-00071-3

Request a Protocol

Ask a question

Favorite

The raw sequence data were processed by the DADA2 1.14 in R 3.6.3 [119] to infer amplicon sequence variants (ASVs) [120]. Specifically, the demultiplexed paired-ended reads were trimmed off the primer sequences (forward reads, first 20 bps; reverse reads, first 18 bps), truncated at the position where the median Phred quality score crashed (forward reads, at position 290 bp; reverse reads, at position 248 bp) and filtered off low-quality reads. After trimming and filtering, the run-specific error rates were estimated and the ASVs were inferred by pooling reads from all the samples sequenced in the same run. The chimeras were removed using the “pooled” method after merging the reads. The resulting raw ASV table and representative sequences were imported into QIIME2 (version, 2020.2) [121]. The taxonomy was assigned by a scikit-learn naive Bayes machine-learning classifier [122], which was trained on the SILVA 132 99% OTUs [123] that were trimmed to only include the regions of 16S rRNA gene amplified by our primers. ASVs identified as chloroplasts or mitochondria were excluded from the ASV table. The ASV table was conservatively filtered to remove ASVs that had no phylum-level taxonomic assignment or appeared in only one biological sample. Contaminating ASVs were identified based on two suggested criteria: contaminants are often found in negative controls and inversely correlate with sample DNA concentration [98]. The ASVs filtered from the raw ASV table were also removed from the representative sequences, which were then inserted into a reference phylogenetic tree built on the SILVA 128 database using SEPP [124]. The alpha rarefaction curves and the core metrics results were generated with a sampling depth of 10,000 and 2047 sequences per sample, respectively (Fig. S9). For downstream data analysis and visualization, QIIME2 artifacts were imported into R using the qiime2R package [125] and a phyloseq [126] object was assembled from the sample metadata, ASV table, taxonomy and phylogenetic tree. The core ASVs were calculated using a prevalence threshold at 80% and visualized by the Venn’s diagram. The alpha-diversity indices, including observed ASVs, Pielou’s evenness, Shannon’s index and Faith’s phylogenetic diversity (PD), were computed via the R packages microbiome [127] and picante [128]. For beta-diversity analyses, we used distance matrices including Jaccard distance, unweighted UniFrac distance, Aitchison distance and phylogenetic isometric log-ratio (PHILR) transformed Euclidean distance. Since rarefying remains to be the best solution for unweighted distance matrices [129], the Jaccard distance and unweighted UniFrac distance were computed in QIIME2 using the rarefied ASV table. The compositionality-aware distance matrices, Aitchison distance and PHILR transformed Euclidean distance, were calculated using the unrarefied ASV table. The Aitchison distance was computed by the DEICODE plugin in QIIME2, a form of Aitchison distance that is robust to high levels of sparsity by using the matrix completion to handle the excessive zeros in the microbiome data [130]. The PHILR transform of the ASV table was performed in R using the philr package [131]. The selected distance matrices were explored and visualized by the principal coordinates analysis (PCoA).

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol