Assembly, binning, reassembly, and gene annotations

HZ Haohui Zhong
LL Laura Lehtovirta-Morley
JL Jiwen Liu
YZ Yanfen Zheng
HL Heyu Lin
DS Delei Song
JT Jonathan D. Todd
JT Jiwei Tian
XZ Xiao-Hua Zhang
request Request a Protocol
ask Ask a question
Favorite

In this study, IDBA-ud 1.1.2 was used to assemble the quality-controlled reads into scaffolds [66] and SPAdes 3.11.0 was chosen to re-assemble mapped reads [67]. Metagenomic reads recruitment (mapping) processes were conducted by BBMap 37.56 and bwa 0.7.5a [68, 69].

MetaBAT 2.12.1 [70] was used to do binning, which is a process to divide the assembled scaffolds into different “bins” based on parameters of the scaffolds, like for example, their tetranucleotide frequency patterns and differential sequencing coverages in various samples. Assembling qualities and initial phylogenetical positions of these bins were measured by CheckM 1.0.7 [31]. Annotations of these genomes were based on arCOG using Prodigal 2.6.3, BLAST+ 2.2.30 and HMMER 3.1b2 [37, 7173]. Coding sequences were predicted by Prodigal with default settings, and then searched against the arCOG database by both BLAST and HMMER using recommended thresholds (expect value < 1e−5). Furthermore, to make sure that the annotation is robust, we also used another automatic online pipeline service RAST with default settings [74]. Genes with ambiguous or uncertain annotations were checked again using InterPro and NCBI’s conserved domain database on their online service [75, 76].

Except for MTA6, all other MTAs were generated by binning. The initial version of MTA1 was from the four deepest merged samples: particle-associated and free-living 10,400 and 10,500 m samples. To increase the completeness of MTA1, reads from 8000 and 9600 m samples were also extracted from referential reads mapping with 97% identities. In the final step, to ensure such MTA1 was not a mixture of different samples, we assembled the reads with 97% identity mapped on initial MTA1 derived only from the 8000 m free-living sample (MTA1 was most abundant in this sample compared to other samples), and all the analyses in this study were based on this final assembly of the metagenome which originated from a single sample. Two other MTAs resulted directly from binning of one single sample (2000 m depth sample). MTA6 was a reference-based assembly from the reads mapped on Ca. N. brevis CN25 with 97% identity because we found one amoA gene at 2000 m depth which was almost identical to the amoA in this strain. There were no other amoA genes (like amoA of the ammonia oxidizing bacteria) in all of these samples.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A