Mean coverage, relative abundances, and recovery success of MAGs

CS Caitlin M. Singleton
FP Francesca Petriglieri
JK Jannie M. Kristensen
RK Rasmus H. Kirkegaard
TM Thomas Y. Michaelsen
MA Martin H. Andersen
ZK Zivile Kondrotaite
SK Søren M. Karst
MD Morten S. Dueholm
PN Per H. Nielsen
MA Mads Albertsen
request Request a Protocol
ask Ask a question
Favorite

Depth of coverage (mean) based on the Illumina data and Nanopore data was calculated using CoverM v0.3.2 (https://github.com/wwood/CoverM) and the filtered bam files created during the mmlong process described above, where the metagenome Illumina and Nanopore reads were mapped directly to the corresponding metagenome assembly (Supplementary Data 3). The following arguments were used: coverm genome -m mean --min-read-aligned-percent 0 --min-read-percent-identity 0 --min-covered-fraction 0. Relative abundances of the MAG species representatives in the metagenomes were calculated by mapping the Illumina data for each of the 69 metagenomes to a concatenated fasta file of the 581 bins using default CoverM settings, except the following arguments: coverm genome -m relative abundance --min-read-aligned-percent 0.75 --min-read-percent-identity 0.95 --min-covered-fraction 0. Stringent identity and alignment cutoffs were used to minimize spurious mappings falsely inflating abundances.

The proportion of the metagenome community recovered in the assembly or the HQ MAG set was investigated using SingleM v0.12.1 (https://github.com/wwood/singlem). SingleM identifies single-copy marker genes of 14 ribosomal proteins in short-read data, assemblies, and bins, and avoids the complications of MAG recovery estimates based on multicopy 16S rRNA genes (https://github.com/wwood/singlem). SingleM pipe was run on the individual metagenomes, assemblies, and HQ MAGs. SingleM summarize was then run on the SingleM pipe MAG operational taxonomic unit (OTU) files to concatenate them into one large table for singlem appraise. This table of OTUs, representing 14 single-copy marker genes, could then be compared to the metagenomes with their corresponding assemblies using singlem appraise. This allowed us to determine the percentage of the metagenome community recovered at each step. For the genus-level recovery estimates, the flags --imperfect --sequence_identity 0.89 were used to cluster the OTUs at 89% ANI, or roughly the genus level8.

Unbinned populations were also identified using the SingleM data, specifically the OTUs that were abundant in the metagenomes but not matched to the HQ MAG set. Here, “singlem pipe” was run with a stringent evalue of “1e-20” on the HQ MAGs to avoid spurious hits to homologous regions, and “singlem appraise” with the flag --output_unaccounted_for_otu_table was used to produce an output table of the unbinned hits (present in the metagenome but not the HQ MAGs). This table was transformed into biom format using “singlem summarise” --biom_prefix and these tables were imported into R v3.5.2 using the ampvis2 v2.5.8. R package (https://github.com/MadsAlbertsen/ampvis2). Relative abundance heatmaps of the unbinned populations (Supplementary Figs. 24) were produced with these data. These data were also used to examine the envOPS12 populations that were not successfully recovered with “singlem query.” The unbinned marker gene sequences of envOPS12 were used as input to query the MQ and low-quality bins of EsbE and EsbW metagenomes.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A