Viral prediction and generation of viral operational taxonomic unit (vOTU) set

AN Alexa M. Nicolas
ES Ella T. Sieradzki
JP Jennifer Pett-Ridge
JB Jillian F. Banfield
MT Michiko E. Taga
MF Mary K. Firestone
SB Steven J. Blazewicz
request Request a Protocol
ask Ask a question
Favorite

To establish a set of viral genomes from our metagenomic efforts we used multiple methods of prediction and dereplicated vOTUs across samples. Viruses were predicted from metagenome and virome contigs. Viruses were predicted using default parameters of the following programs: VirSorter1 v1.0.699 (all categories), VirSorter2 v2.2100 (dsDNA and ssDNA only), VIBRANT v1.2.1101, deepvirfinder v1.0102 (score ≥0.9; P value ≤ 0.05), and seeker v1.0.3103. From the set of virome and metagenome contigs explored 3,979,792 were predicted to be viral. To most robustly establish a viral set we chose to subset these predicted viral sequences to only those identified by more than one tool104, 777,725 contigs met these criteria, and 32,660 of these sequences were either predicted to be circularized contigs by VRCA105 (i.e., implied to be a complete genome), or ≥10,000 base pairs long. This set of size-filtered, circularized, repeatedly predicted viruses was dereplicated using MMseqs2 v13-45111106 (95% ANI, 85% breadth required of sequences) to 26,368 viral operational taxonomic units (vOTUs). All vOTUs were annotated using DRAM-v v1.2.0 with default parameters106. To assess viral lifestyle we searched DRAM-v output for integrase-annotated ORFs and denoted integrase-containing vOTUs as temperate viruses with the capacity to integrate into a host chromosome. A single-gene approach will not capture all temperate viral diversity, but integrase genes have been used as an effective hallmark with much higher recovery than other genes (e.g., excisionases) in previous peer-reviewed studies27,34,55,68,107,108. To identify potential Inoviridae in our vOTUs we performed HMM searches of known pI-like ATPase protein family using 32 protein models constructed based on alignments of pI-like ATPase from Inoviridae identified in RefSeq109, publicly available (https://github.com/simroux/Inovirus/blob/master/Inovirus_classifier/Ino_classifier_db/pI_PCs_db_annot.hmm). We used HMMER v3.1b2 (http://hmmer.org) with default parameters i.e., e-value ≤ 10. No viral proteins met these lax parameters to be considered homologous to Inoviridae pI-like ATPases.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A