Mean coverage, relative abundances, and recovery success of MAGs

Caitlin M. Singleton; Francesca Petriglieri; Jannie M. Kristensen; Rasmus H. Kirkegaard; Thomas Y. Michaelsen; Martin H. Andersen; Zivile Kondrotaite; Søren M. Karst; Morten S. Dueholm; Per H. Nielsen; Mads Albertsen

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Mean coverage, relative abundances, and recovery success of MAGs

CS Caitlin M. Singleton

FP Francesca Petriglieri

JK Jannie M. Kristensen

RK Rasmus H. Kirkegaard

TM Thomas Y. Michaelsen

MA Martin H. Andersen

ZK Zivile Kondrotaite

SK Søren M. Karst

MD Morten S. Dueholm

PN Per H. Nielsen

MA Mads Albertsen

This method is extracted from research article: Nat Commun, Mar 2021

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

DOI: 10.1038/s41467-021-22203-2

Request a Protocol

Ask a question

Favorite

Depth of coverage (mean) based on the Illumina data and Nanopore data was calculated using CoverM v0.3.2 (https://github.com/wwood/CoverM) and the filtered bam files created during the mmlong process described above, where the metagenome Illumina and Nanopore reads were mapped directly to the corresponding metagenome assembly (Supplementary Data ³). The following arguments were used: coverm genome -m mean --min-read-aligned-percent 0 --min-read-percent-identity 0 --min-covered-fraction 0. Relative abundances of the MAG species representatives in the metagenomes were calculated by mapping the Illumina data for each of the 69 metagenomes to a concatenated fasta file of the 581 bins using default CoverM settings, except the following arguments: coverm genome -m relative abundance --min-read-aligned-percent 0.75 --min-read-percent-identity 0.95 --min-covered-fraction 0. Stringent identity and alignment cutoffs were used to minimize spurious mappings falsely inflating abundances.

The proportion of the metagenome community recovered in the assembly or the HQ MAG set was investigated using SingleM v0.12.1 (https://github.com/wwood/singlem). SingleM identifies single-copy marker genes of 14 ribosomal proteins in short-read data, assemblies, and bins, and avoids the complications of MAG recovery estimates based on multicopy 16S rRNA genes (https://github.com/wwood/singlem). SingleM pipe was run on the individual metagenomes, assemblies, and HQ MAGs. SingleM summarize was then run on the SingleM pipe MAG operational taxonomic unit (OTU) files to concatenate them into one large table for singlem appraise. This table of OTUs, representing 14 single-copy marker genes, could then be compared to the metagenomes with their corresponding assemblies using singlem appraise. This allowed us to determine the percentage of the metagenome community recovered at each step. For the genus-level recovery estimates, the flags --imperfect --sequence_identity 0.89 were used to cluster the OTUs at 89% ANI, or roughly the genus level^⁸.

Unbinned populations were also identified using the SingleM data, specifically the OTUs that were abundant in the metagenomes but not matched to the HQ MAG set. Here, “singlem pipe” was run with a stringent evalue of “1e-20” on the HQ MAGs to avoid spurious hits to homologous regions, and “singlem appraise” with the flag --output_unaccounted_for_otu_table was used to produce an output table of the unbinned hits (present in the metagenome but not the HQ MAGs). This table was transformed into biom format using “singlem summarise” --biom_prefix and these tables were imported into R v3.5.2 using the ampvis2 v2.5.8. R package (https://github.com/MadsAlbertsen/ampvis2). Relative abundance heatmaps of the unbinned populations (Supplementary Figs. ²–⁴) were produced with these data. These data were also used to examine the envOPS12 populations that were not successfully recovered with “singlem query.” The unbinned marker gene sequences of envOPS12 were used as input to query the MQ and low-quality bins of EsbE and EsbW metagenomes.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol