Assembly

AL Anton Lavrinienko
ET Eugene Tukalenko
TM Timothy A. Mousseau
LT Luke R. Thompson
RK Rob Knight
TM Tapio Mappes
PW Phillip C. Watts
ask Ask a question
Favorite

Read data were processed using ATROPOS32 v.1.1.5 (parameters: -q 15 --minimum-length 90), after which the reads were mapped (BOWTIE233 v.2.3.4, parameters: --very-sensitive) against a draft bank vole genome (GCA_001305785.1) to filter out reads (from *.bam files) derived from the host (SAMTOOLS v.1.4 view, parameters: view -f 12 -F 256) (https://www.htslib.org/doc/samtools.html). Assembly of metagenome read data to obtain draft bacterial genomes followed the approach used to recover draft genomes from the TARA oceans metagenomics data34. Individual samples were assembled using MEGAHIT35 v.1.1.1-2 (parameters: default) to reduce memory requirements and in attempt to avoid bubbles (unresolvable branches) due to genetic diversity among strains. Using MEGAHIT, we assembled a total of 1,057 million paired end reads into 4,721,549 primary contigs (5,916,721,003 bp). These primary contigs were filtered to retain only those contigs ≥ 2 kbp in length, which were then passed through CD-HIT-EST36 v.4.7.0 (parameters: -c 0.99 -n 11 -M 0) to merge the completely overlapping contigs (at 99% identity); this reduced set of primary contigs was then co-assembled using MINIMUS2 in AMOS37 v.3.1.0 (parameters: -D REFCOUNT = 0 OVERLAP = 100 MINID = 95) to combine overlapping contigs. After this procedure, we were left with 171,806 secondary contigs (39,092 contigs and 132,714 singletons) for binning.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A