To identify viral contigs, a number of filtering steps were applied. All contigs ≥ 10 kb and circular contigs < 10 kb [53] were processed using MASH v2.0 [63] separately against the RefSeq70 database [64] and a publicly available database of phage genomes (March 2020; P = 0.01). If the closest RefSeq70 hit was to a phage/virus, the contig was included as a viral operational taxonomic unit (vOTU). Failing this, if the contig obtained a closer hit to the phage database than RefSeq70, the contig was included as a vOTU. Remaining contigs were included as vOTUs if they satisfied at least two of the following criteria; 1: VIBRANT v1.0.1 indicated sequence is viral [65], 2: obtained adjusted p value ≤ 0.05 from DeepVirFinder v1.0 [66], 3: 30% of ORFs on the contig obtained a hit to a prokaryotic virus orthologous group (pVOG) [67] using Hmmscan v3.1b2 (-E 0.001) [68]. However, circular contigs < 10 kb only had to satisfy either criteria 1 or 3, as DeepVirFinder scores for these contigs were inconsistent.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.