To identify DNA methyltransferases in publicly available viral genomes, we chose a strategy that uses both BLAST 2.7.1 + and HMMER 3.1b2 to generate alignments to a reference database derived from experimentally characterized “Gold Standard” DNA methyltransferases found in New England Biolabs’ REBASE (Roberts et al., 2015). To identify functional motifs of DNA methyltransferases, we used hmmscan with gathering cutoffs to collect Pfams (from release 31) (El-Gebali et al., 2019) represented in our Gold Standard database, including PF05869.11 (Dam), PF00145.17 (DNA_methylase), PF07669.11 (Eco57I), PF13651.6 (EcoRI_methylase), PF12161.8 (HsdM_N), PF02086.15 (MethyltransfD12), PF02384.16 (N6_Mtase), PF01555.18 (N6_N4_Mtase), and PF12564.8 (TypeIII_RM_meth). Gold Standard methyltransferases that did not contain one of the Pfams used in this study were queried in BLAST searches to identify putative DNA methyltransferases in viral proteomes and can be found in “BLASTexceptions.fasta.” A viral protein was considered to be a DNA methyltransferase if its protein profile aligned with hmmsearch exceeded gathering cutoffs or the viral protein aligned to a DNA methyltransferase sequence via BLAST with the query protein being at least 75% of the alignment length and the e-value was < 1E-5.
Viral assembly metadata from RefSeq, GenBank, and NCBI taxonomy were downloaded on June 26th, 2018 and stored in “Viral_assemblydat.tsv” and “nodes.dmp.” If a virus was included in both RefSeq and GenBank, the RefSeq assembly was preferentially used as a query. Because viral assemblies deposited in GenBank are annotated using non-standardized approaches, we chose to translate all frames in these viral genomes to avoid inconsistencies in annotation approaches. DNA methyltransferase pfams and characterized methyltransferase sequences lacking pfams (outlined above) were queried with HMMER and BLAST, respectively, to annotate putative DNA methyltransferases found in the translated frames of each viral genome. To map viruses to the hosts they infect, mappings were downloaded from the Virus-Host DB1. Jupyter notebooks and associated source code used for DNA methyltransferase annotation can be found at www.github.com/SEpapoulis/MTannotation.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.