2.5. Fecal DNA Extraction, Whole Genome Shotgun Sequencing and Metagenomic Analyses

RC Rita R. Colwell
request Request a Protocol
ask Ask a question
Favorite

Total DNA was isolated from fecal specimens, using MOBIO Power-Soil® DNA Isolation Kit (Qiagen, Germantown, MD, USA), following manufacturer’s instructions. Each DNA sample was normalized in 3–18 µL of nuclease-free water to a final concentration of 0.5 ng µL−1 using Biomek FX liquid handler (Beckman Coulter Life Sciences, Brea, CA, USA). Libraries were constructed using Nextera XT Library Prep Kit (Illumine, San Diego, CA, USA). For each sample, an input of 0.5 ng was used in the tagmentation reaction, followed by 13 cycles of PCR amplification using Nextera i7 and i5 index primers and 2X KAPA master mix per the modified Nextera XT protocol. The PCR products were purified using 1.0X speed beads and eluted in 15 uL of nuclease-free water. Final libraries were quantified by PicoGreen fluorometric assay (100X final dilution) and concentrations were in the range of 0.1–4.0 ng uL−1. The libraries were pooled by adding an equimolar ratio of each based on the concentration determined by PicoGreen, and loaded onto a high sensitivity (HS) chip run on the Caliper LabChipGX (Perkin Elmer, Waltham, MA, USA). The base pair size reported was in the range of 301–680 bp. Samples were sequenced using a single Illumina HiSeq v3 flowcell by multiplexing eight libraries per lane targeting 25 million 100 bp reads per sample. Standard read quality assessments were performed prior to metagenomics analyses using open source BBDuk software from BBTools (https://jgi.doe.gov/data-and-tools/) and all samples conformed to an average read quality of Q20 indicating 99% sequencing accuracy (https://www.illumina.com/science/education/sequencing-quality-scores.html). Reads per sample were consistent, 35,000,000 ± 14,000,000 reads/sample (median 32,000,000), indicating comparable read depth.

Unassembled whole genome shotgun metagenomic sequencing reads were directly analyzed using the CosmosID, Inc. bioinformatics software package (CosmosID Inc., Rockville, MD, USA), as described [22,23,24,25], to achieve bacterial identification to species, subspecies, and/or strain level and quantification of microorganism relative abundance. Briefly, the system utilizes a high performance data-mining k-mer algorithm and highly curated dynamic comparator databases (GeneBook®, CosmosID, Inc., Rockville, MD, USA) that rapidly disambiguate millions of short reads into the discrete genomes or genes engendering the particular sequences. The GeneBook® databases are composed of over 150,000 microbial genomes and gene sequences representing over 1000 bacterial, 5000 viral, 250 protists and 1500 fungal species, as well as over 5500 antibiotic resistant and virulence associated genes. Each GeneBook® database was screened and cleaned for host genome sequences including human, pig, and dog genomes, followed by validation by analyzing each host genome as a query in the curated databases. The web portal is hosted at AWS cloud and can be accessed at https://app.cosmosid.com/login.

Metagenomic analysis is based on a proprietary high performance data-mining k-mer algorithm, implemented by C, as the core engine. The analysis algorithm has two separable comparators: a pre-computation phase for the reference database and a per-sample computation. The input to the pre-computation phase is a reference microbial genome or antibiotic resistance GeneBook® database, and its output is phylogeny trees, together with sets of variable length k-mer fingerprints (biomarkers) that are uniquely identified with distinct nodes, creating branches and leaves of the tree. The reference GeneBook® database constitutes both publicly available genomes or gene sequences, such as NCBI- RefSeq/WGS/SRA/nr, PATRIC, M5NR, IMG, ENA, DDBJ, CARD, ResFinder, ARDB, ARG-ANNOT, mvirdb, VFDB, as well as a subset of genomes sequenced by CosmosID, Inc. and its collaborators. The second per-sample computational phase searches the hundreds of millions of short sequence reads or contigs from draft assembly against the fingerprint sets. The resulting statistics are analyzed to give fine-grain composition and relative abundance estimates. Edit distance-scoring techniques are used to compare a target genome or gene with the reference set. The algorithm provides similar functionality as BLAST. Classification precision is maintained employing aggregation statistics. Enhanced detection specificity is achieved by running comparators in sequence. In summary, the two-part analysis consists of first finding reads in which there is an exact match with a k-mer uniquely identified with a GeneBook® reference database, and then statistically scoring the entire read against the GeneBook® reference to verify the read is indeed uniquely identified with that reference. For each sample, the reads from a species are assigned to a strain with the highest aggregation statistics. Similarly, the community resistome, the collection of antibiotic resistance genes in the microbiome, was also identified using the CosmosID, Inc. bioinformatics software package to query unassembled sequence reads against the CosmosID, Inc. curated antibiotic resistance gene database in a manner analogous to the bacterial species identification.

Analyses of the bacterial sequence data included Shannon alpha diversity [26], principal component analyses, stacked bar graphs, and heatmaps based on relative abundance of each microorganism (%) in each sample using the NMF R software package [27]. Resistome analysis was performed by identification of antibiotic-resistance genes based on percentage of gene coverage for each gene as a function of the gene-specific read frequency in each sample. Statistical analyses were performed using Microsoft Excel 2016 (Microsoft Corporation, Redmond, WA, USA) or GraphPad Prism 7 (GraphPad Company, San Diego, CA, USA).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A