Binning followed the pipeline mmlong v0.1.2 hybrid-metaflow after the assembly step (https://github.com/SorenKarst/mmlong). Briefly, Nanopore and Illumina reads were mapped to the polished Nanopore assembly using minimap2 v2.15. Automatic binning was conducted using MetaBAT2 v2.12.159 and MaxBin v2.2.760. Metagenome contigs were translated into proteins using FragGeneScan v1.3161, annotated taxonomically using Kaiju v1.6.062 against the proGenomes database (2017-05-16), 16S rRNA genes were identified with barrnap v0.9 (https://github.com/tseemann/barrnap), and then classified with MOTHUR v2.7.14 classify.seqs against the SILVA v132 seed database63. Binning was conducted using two coverage approaches for each of the two binning tools, first using the differential coverage information from only the same plant as the assembly (i.e., the corresponding three Illumina metagenomes 2016, 2017, and 2018) or differential coverage information from all of the Illumina metagenomes (69 in total). DASTool v1.1.164 --search_engine diamond was used to dereplicate and select for the best representative bin from the four binning iterations for each of the 23 metagenomes.
The dereplicated bins were then checked for completeness and contamination using CheckM --lineage_wf65 v1.0.11, resulting in 3733 MQ to HQ MAGs. Circular genomes were identified by linking the contig names back to the suggestCircular=yes CANU designations. Circular contigs > 700 kbp were identified as likely circular chromosomes. Ultra-small genomes with a circular chromosome were included in the HQ set, despite not full-filling the completeness cut-off of >90%.
Five MAGs failed the contamination threshold and were manually examined using mmgenome2 v2.0.7 (https://github.com/KasperSkytte/mmgenome2), four were circular, and one was a MAG of interest (Nitrospira). Likely contaminating contigs were removed and quality was re-checked with CheckM. These manually improved MAGs are identified by “v2” in the MAG ID in the Supplementary Tables. Of 43 MAGs containing a circular contig of >700,000 kb, 29 circular MAGs were manually checked and additional non-circular extraneous contigs were removed. These are identified by “cln” in the MAG ID. Three MAGs that contained circular contigs but also large additional linear contigs encoding single-copy marker genes were potentially multichromosomal and were removed from the CMAG designations.
dRep v2.3.266 -comp 50 -con 10 dereplicated the MAGs at 99% ANI clustering to indicate the number of distinct lineages and overlap of likely strains between WWTPs, and at 95% ANI to indicate the number of distinct species.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.