Metagenomic analysis

AM Amir Mohaghegh Motlagh
AB Ananda S. Bhattacharjee
FC Felipe H. Coutinho
BD Bas E. Dutilh
SC Sherwood R. Casjens
RG Ramesh K. Goel
request Request a Protocol
ask Ask a question
Favorite

Paired-end raw reads were interleaved, quality filtered and trimmed using CLC Genomics Workbench v.7.0.4 (CLC Bio, Denmark) with a threshold of 100 bp as the minimum length of read and a Phred score of 28. The trimmed reads were de novo assembled using CLC Genomics Workbench v.7.0.4 with the following criteria: word size of 20 bp, automatic bubble size of 50 bp, and minimum contig length of 500 bp. Identification of open reading frames (ORFs) and gene prediction were performed using MetaGeneMark v.2.8 (Zhu et al., 2010) followed by gene annotation using the RPSBLAST program (Altschul et al., 1990) on the clusters of orthologous group (COG) prokaryotic protein database (Tatusov et al., 2000). Statistical over-representation of annotated COG genes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways was determined by pairwise comparisons of each metagenomic sample using Fisher's exact test, with confidence intervals at 99% significance. Besides COG analysis, the predicted genes were categorized into SEED Subsystems to provide consistent and accurate genome annotations (Overbeek et al., 2014) through MG-RAST (metagenomics rapid annotation using subsystem technology) pipeline to identify all of the encoded proteins and their functions in the host metabolism. The enzymes were compared against SEED Subsystems using a maximum e-value of 1E-5, a minimum identity of 60%, and a minimum alignment length of 100 measured in amino acids for protein databases. The minimum alignment length of 100 amino acids (equal to 300 bps) was chosen as the criterion to specify the possible functional genes (not only partial genes) that are transferred from prokaryotes to the phages.

In order to understand the phage-host interactions, the GC contents of phage and prokaryote contigs were calculated and compared with contig percentage shown in a histogram. The occurrence of each single tetranucleotide in the phage and prokaryote contigs was calculated using JSpecies v.1.2.1 (Richter and Rossello-Mora, 2009). In addition, the calculated GC content of phage and prokaryote contigs were graphed along with the scaffold length and contig coverage using the ggplot package in R v.3.0.1. The sequences are deposited in MG-RAST repository with accession number mgm4582011.3 and mgm4582012.3 for bacteria and phage sequences, respectively.

The taxonomic composition of the prokaryotic metagenome was determined by comparison of the assembled contigs against the NCBI non-redundant (nr) nucleotide database using tBLASTx v. 2.2.28 program (Altschul et al., 1990) with 1E-5 e-value cut-off. Relative abundance of taxa was assessed on the basis of average counts of mapped reads on the BLAST annotated contigs. Therefore, for the reads that were mapped to contigs, every blast hit was multiplied by the average depth of the reads on each contig. Eventually, significant hits to GenBank entries were recorded in a BLAST output file and imported on MEGAN v.5 (Huson et al., 2007) for interpretation with the lowest common ancestor (LCA) method followed by visualization in iTOL online tool (Letunic and Bork, 2007). For the MEGAN analysis, default parameters for each metagenome were selected as follows: minimum support of 5, minimum score of 50, top percent of 10, win-score of 0, and minimum complexity of 0.44.

The affinities of the sequences for known metabolic functions were also annotated using BLASTx with cut-off e-value of 1E-5 against SEED subsystems v.2.0 (Overbeek et al., 2005) using MG-RAST server v.3.3 (Meyer et al., 2008) and KEGG metabolic pathways (Kanehisa et al., 2008). In addition, to understand the role of prokaryotic communities in nutrient cycles, the identified prokaryotes were compared manually with a curated database on functional genes (FunGene) involved in various biogeochemical cycles (Fish et al., 2013).

In addition to initial viral particle purification step prior to sequencing and DNase treatment of the purified phage prior to DNA extraction, in order to exclude any bacterial contamination from the phage contigs, genes encoding the 5S, 16S, and 23S rRNAs (from prokaryotic genomes) were identified in phage contigs using hmm_rRNA on WebMGA server based on an HMM search using HMMER v.3 (Finn et al., 2011).

Since analysis of the phage metagenome was also performed on contigs instead of reads, the relative abundance of contigs was corrected by the average depth of the reads on each contig as mentioned earlier. For phage taxonomic analysis, the filtered phage contigs were compared against NCBI RefSeq viral genome database by using tBLASTx v. 2.2.28 program with e-value of 1E-5 cut-off. The generated BLAST output file was imported on MEGAN v.5 for interpretation by lowest common ancestor (LCA) method using the default parameters as mentioned above (results not shown). Instead, the abundance of phages from the GSL was explored with respect to metagenomes of marine, freshwater, and lake sediment samples. Raw reads of aquatic viromes available on Metavir (Roux et al., 2011) were mapped to assembled phage contigs with length longer than 20 kbp as putative complete phages using Bowtie v.2 (Langmead and Salzberg, 2012). Contig abundances were obtained by counting the number of reads mapped to each contig and correcting by contig length. The obtained abundance matrix with identity percentages was normalized and plotted as a heatmap in which samples and contigs were clustered according to their Euclidean distance using R v.3.0.1.

To achieve a more detailed understanding of the phage-host interactions, mobile genetic elements including prophages, plasmids, and transposons were investigated in the prokaryote contigs. Following gene prediction in the prokaryote contigs, integrated prophages were detected using Prophinder (Lima-Mendez et al., 2008) to identify mobile genetic elements. Simultaneously, in order to relate the prophages to their host, prokaryote contigs were compared against ACLAME (A CLAssification of Mobile genetic Elements) database (Leplae et al., 2004) using a BLASTn search with e-value 1E-5 cut-off, followed by extraction of the prophage hit regions (excluding viruses and plasmids) from the BLAST output results. These prophage regions were compared with prophages found using Prophinder and the shared prophages with their corresponding prokaryotic hosts were extracted to generate the taxonomic cladogram using MEGAN with default parameters mentioned earlier. In addition, nutrient cycling of the prokaryotic hosts was manually annotated based on the KEGG database and used to generate the prophage-host interaction network using iTOL online tool.

Clustered regularly interspaced short palindromic repeat (CRISPR) is an anti-viral mechanism in archaea and bacteria wherein genomic sequences from the predatory viruses are integrated in the host genome providing immunity to these viruses (Horvath and Barrangou, 2010). Therefore, CRISPR spacer sequences can provide a direct link between viruses and their prokaryotic hosts (Kunin et al., 2008; Heidelberg et al., 2009; Anderson et al., 2011) and CRISPR spacers may be viewed as a database of fragments derived from phage and plasmid genomes. In addition, phage genome can acquire some prokaryotic genes during an infection event of the host and therefore, homology of phage and prokaryotic genes can indicate phage-host associations. In our study, the assembled bacteria contigs were compared against phage contigs as the database by using tBLASTx v. 2.2.28 setting a 1E-5 e-value cut-off to find their homology. Afterward, all CRISPR arrays in the prokaryote contigs with homology to phage contigs were identified using CRISPR Recognition Tool (CRT) v.1.1 (Bland et al., 2007). Putative protospacer targets were also identified using CRISPRTarget (Biswas et al., 2013), following a BLASTn search of the spacer input against ACLAME, GenBank Environmental, GenBank Phage, and RefSeq plasmid, RefSeq viral databases with the default settings.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A