Shotgun metagenomics sequencing and analysis

JM Joseph D. Madison
BL Brandon C. LaBumbard
DW Douglas C. Woodhams
RK Ruslan Kalendar
RK Ruslan Kalendar
RK Ruslan Kalendar
request Request a Protocol
ask Ask a question
Favorite

Library preparation for sequencing reactions was carried out using an NEB workflow for the NEBNext Ultra II DNA library prep kit for Illumina (New England Biolabs, Ipswich, MA). Paired-end 151 bp sequencing was carried out by the University of Minnesota Genomics Core on an Illumina NovaSeq 6000 using an S4 flowcell, with an average read depth of 23,844,087 reads/sample (range: 9,108,413–54,875,278/sample; see S1 Table).

For analysis, raw Illumina output was first sorted by index-linked sample ID. Sorted reads were then quality inspected with FastQC followed by host and background nucleic acid removal. This was completed by construction of custom host databases of R. temporaria and B. Bufo with Kraken2 [18]. Non-matching reads to the host database meeting the 0.5 confidence threshold were then sorted into a separate file for downstream analysis. The R. temporaria genome was used as a completed whole-genome reference is not available for R. pipiens (the host from which swabs were derived). The R. temporaria genome has high homology to R. pipiens, and the same karyotype [19], making it an acceptable reference for this filtering step. Likewise, sequences with 0.5 confidence homology to Bufo bufo were matched and removed as separate Anaxyrus americanus museum specimens were present and processed in the laboratory space used.

Background swabs processed in parallel but without specimen swabbing were also used as controls for background contamination removal. Following the same procedure as the host removal, a database was constructed of the control swabs in Kraken2 and matches were removed. A more conservative confidence threshold of 0.8 was used so as to ameliorate concerns of erroneous removal of close matches between bacteria in the sample and background (S1 Fig). Exact matches resulting in sample removal were unlikely based on differential degradation patterns in the specimen and also visual inspection of sequence taxonomy tables. After quality inspection, host removal, and background removal, the output was analyzed in the Qiita metagenomic analysis pipeline [20] and is available in the Data Accessibility and Benefit-Sharing section below.

Qiita analysis followed the recommended shotgun metagenomics pipeline. This workflow includes an initial sample adapter removal step with fastp [21], human host filtering with minimap2 (to account for sample contamination during handling [22], and taxonomic profiling via bowtie2 [23] with either the WoL reference database [24] (representing Bacteria and Archaea) or the Rep 200 database which is composed of RefSeq assemblies [25] (representing Archaea, Bacteria, Fungi, Protozoa, and Viruses), and species-level feature-table files in.qza format generated with Woltka [26]. These feature tables were then analyzed with QIIME2 [27]. Shotgun metagenomic data was not subject to rarefaction prior to downstream diversity analyses due to qualitatively increasing alpha diversity with increasing sequence depth (no plateau), and also greater uncertainty as to the effects of rarefaction on analysis of shotgun metagenomic sequencing data (as opposed to 16S rRNA gene sequencing).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A