Aerial root mucilage metagenome analysis with MG-RAST

TP Tania Pozzo
SH Shawn M. Higdon
SP Sivakumar Pattathil
MH Michael G. Hahn
AB Alan B. Bennett
request Request a Protocol
ask Ask a question
Favorite

The five sequenced metagenomic libraries used in this study were generated as part of a related project examining the microbiomes associated with maize plants from the same geographic origin [11]. All five metagenomes were derived from mucilage samples that were collected from maize plants grown in the Sierra Mixe region of Mexico. Four of the samples were collected on August 1, 2008 and the fifth was collected on August 27th, 2013. Each collected sample was shipped to U.C. Davis, where total DNA was then extracted using a DNA isolation kit (Mo Bio Laboratories, Inc, USA). Illumina sequencing libraries were prepared using total DNA extracts from each mucilage sample with a procedure modified from the Nextera transposase-based library construction method and multiplex barcoding. All mucilage metagenome libraries were sequenced using the Illumina MiSeq and HiSeq 2000 instruments. Sequences were de-multiplexed and trimmed using Trimmomatic (version 0.33) with the following parameters: Illuminaclip 2:30:10, Headcrop:15, Leading:20, Trailing:20, Sliding window:4:20, and Minlen:100. PhiX and maize sequences (genomic, chloroplast and mitochondrial) within the mucilage metagenomes were screened for using Bowtie2 [19] against the PhiX genome (Genbank acc# NC_001422.1) and Zea mays cultivar B73 draft genome (RefSeq assembly acc# GCF_000005005.2). These sequences were then filtered out of the mucilage metagenomes that were uploaded to MG-RAST.

The trimmed and filtered sequence data for all five mucilage metagenomes were analyzed using MG-RAST web servers and exist as public records that were assigned the following IDs: samples from 2008 (mgm4504365.3, mgm4504364.3, mgm4504362.3 and mgm4504361.3) were uploaded to the server as fastq files containing the reads that survived the read processing described above, and the sample from 2013 (mgm4550815.3) was uploaded to the server as a list of contigs in fasta format that were assembled from the surviving fastq reads using the de Bruijn graph assembly program IDBA-UD v1.1.0 [20]. In terms of the total number of uploaded sequences for the five mucilage metagenomes on MG-RAST, mgm4504365.3 had 178,184 sequences, mgm4504364.3 had 19,282 sequences, mgm4504362.3 had 265,446 sequences, mgm4504361.3 had 5,024 sequences, and mgm4550815.3 had 74,429 sequences. Additionally, the average sequence length values for the mgm4504365.3, mgm4504364.3, mgm4504362.3, mgm4504361.3 and mgm4550815.3 metagenome libraries on MG-RAST were 105 ± 21 bp, 150 ± 21 bp, 109 ±25 bp, 149 ± 23 bp, and 1,475 ± 345 bp respectively. The reads from each aerial root mucilage metagenome library were analyzed collectively to evaluate the relative abundance of functional gene categories within the subsystems database using MG-RAST version 4.0.3. S1 Fig shows the workflow of the analysis made using MG-RAST. The overall microbial diversity within the aerial root mucilage samples was then assessed by querying the reads of all five metagenomes against the Refseq database using the analysis tool with the default settings, also using MG-RAST version 4.0.3 [21]. Results from the query of all reads against the Refseq database were then filtered based on phylum (S2 Fig, S1 Table) and class (S2 Table).

Using MG-RAST version 3.3.6, subsytems annotation of all aerial root mucilage metagenome samples was carried out using the default settings (maximum e-value cut off value of 1e-5, minimum percent identity cutoff value of 60, minimum alignment length cutoff value of 15 (S3 Fig. and S3 Table). The subsystem designated as “Carbohydrate” was evaluated in detail to identify bacterial genes that were predicted to encode CAZymes involved in mucilage polysaccharide catabolism. This process enabled the identification of partial DNA sequences from the mucilage metagenomes that were similar to known GenBank sequence annotations of previously reported bacterial genomes. The identified reference genome sequences were then utilized for artificial gene synthesis rather than using a PCR based approach to extract the actual enzyme coding sequences from the environmental samples because the original DNA extractions that were used to make the sequencing libraries were not available. These sequences of interest for CAZyme production are presented in Table 1, which provides relevant information regarding how the sequences were selected based on having a relatively high number of abundance hits across all five mucilage metagenome libraries, their alignment scores to genomic sequences in the GenBank database, the length of the reported sequence alignment, and the sequence alignment e-value scores.

Analysis of the five mucilage metagenomes using the strategies depicted in Fig 1 and S1 Fig led to the selection of the following full-length coding sequences that were annotated in genomes that had been previously deposited to the GenBank database. “GH Family” corresponds to the selected enzyme coding sequence’s designation within the CAZy database. “Abundance Hits” refers to the number of partial sequences across all five mucilage metagenome samples that had sequence alignment hits to the selected full-length GH coding sequence. “Sequence Similarity” indicates the percentage of sequence similarity produced by alignment between the mucilage metagenome partial sequences and the full-length coding sequence from GenBank that was selected for gene synthesis.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A