Whole metagenome sequencing

Ana C. Henriques; Rui M.S. Azevedo; Paolo De Marco

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Whole metagenome sequencing

AH Ana C. Henriques

RA Rui M.S. Azevedo

PM Paolo De Marco

This method is extracted from research article: PeerJ, Oct 2016

Metagenomic survey of methanesulfonic acid (MSA) catabolic genes in an Atlantic Ocean surface water sample and in a partial enrichment

DOI: 10.7717/peerj.2498

Request a Protocol

Ask a question

Favorite

The metagenomes (from SCD0 and SCDE samples) were sequenced at Molecular Research LP (Shallowater, TX, USA). Paired-end sequencing libraries were prepared (2 × 101 bp) and sequencing was performed using the Illumina HiSeq2500 platform. The libraries were prepared using Nextera DNA Sample preparation kit (Illumina) following the manufacturer’s user guide. The initial concentration of DNA was evaluated using the Qubit^® dsDNA HS Assay Kit (Life Technologies). The samples were then diluted accordingly to achieve the recommended DNA input of 50 ng at a concentration of 2.5 ng/µL. Subsequently, the samples underwent fragmentation, addition of adapter sequences and PCR amplification (5 cycles) during which a unique index was added to each sample. The average library size was determined using the Agilent 2100 Bioanalyzer (Agilent Technologies). The libraries were then pooled in equimolar ratios at 2 nM and 5 µL of the library pool was clustered using the cBot (Illumina) and sequenced paired-end for 200 cycles using the HiSeq 2500 system (Illumina).

The quality of the sequencing reads from both libraries was assessed using FastQC on Galaxy web-based platform (https://usegalaxy.org/). The libraries were also checked for human contamination with Kraken Metagenomics (Zaharia et al., 2011; Wood & Salzberg, 2014) at Illumina BaseSpace (http://basespace.illumina.com/home/index). As the resulting values for the presence of human sequences were very low (0.15% for sample SCD0 and 0.04% for sample SCDE), no filtering was performed before the subsequent steps of analysis (general statistics are described in Table S1).

Sequencing data for the two samples, SCD0 and SCDE, were submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under project number PRJEB9018 and sample accession numbers ERS700852 and ERS700853, respectively. The analyses of the metadata were performed through the EBI Metagenomics service pipeline (https://www.ebi.ac.uk/metagenomics/pipelines/2.0 (Mitchell et al., 2015a)) that includes a quality control step, a taxonomic analysis step based on 16S rDNA sequences and a functional analysis of predicted protein coding sequences using the InterPro resource (Mitchell et al., 2015b).

BIOM (see http://biom-format.org/) files containing phylogenetic classification information provided by EBI metagenomics were used to construct rarefaction curves using MEGAN (version 5.10.6; Huson, Mitra & Ruscheweyh, 2011). Phylogenetic data were also used to estimate the alpha diversity of the two samples (Shannon index (Shannon, 1948), evenness (Mulder et al., 2004) and Chao species estimator (Chao, 1984)). For beta diversity analysis, Jaccard (Jaccard, 1912), Kulczynski (Faith, Minchin & Belbin, 1987), and Chao (Chao, Chazdon & Shen, 2005) indices were calculated through the Vegan package (Oksanen et al., 2015). Bray-Curtis dissimilarity index (Bray & Curtis, 1957) was calculated starting from relative abundances. Only significant differences in phylogenetic or functional composition were retained (Fisher’s exact / χ² test with a significance level of 0.05 with a Bonferroni correction on the number of comparisons, in order to minimize false positives; code was adapted from Metastats (White, Nagarajan & Pop, 2009)).

Assembled metagenomic data from both samples were also submitted to the DOE Joint Genome Institute’s Integrated Microbial Genome Metagenomic Expert Review (IMG/MER) annotation pipeline (http://img.jgi.doe.gov/) for functional and taxonomic annotation (Markowitz et al., 2014) (SCD0: GOLD Analysis Project Id Ga0069134/biosample ID Gb0111627; SCDE: GOLD Analysis Project Id Ga0069135/biosample ID Gb0111630). Before submission, reads were quality checked (FastQC for quality control at Galaxy for trimming and filtering): only sequences with quality scores equal or higher than 20 over 95% or more of the nucleotides were kept. For each sample, forward and reverse sequence files were merged and assembled using Megahit (Li et al., 2015) (general statistics are reported in Table S2). A further analysis was performed at MG-RAST (Meyer et al., 2008) using the subsystems approach (Overbeek et al., 2005): metagenome SCD0 was submitted under sample no 4698364.3 and SCDE under sample no 4698363.3. Binning was performed using MetaBat v0.25.4 (Kang et al., 2015) and the bins obtained were analyzed by MG-RAST.

A flowchart with the major steps of the analysis is available in Fig. S1.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol