Full-length genomic sequences of RNA viruses were downloaded in the GenBank format from the NCBI website (http://www.ncbi.nlm.nih.gov/). The sequences were semi-automatically curated to exclude non-RNA viral genomes using a custom Perl script and by manual inspection. ORFs with annotations for noncanonical codon usage or in an unconventional translational manner, such as ribosomal frameshift were excluded. Next, using a custom Perl script, (i) sequences of ORFs with G1–2A6+ and G0A6+ motifs were parsed, (ii) translation products produced from RNA genomes with +1 or –1 base indels at each G1–2A6+ or G1A6+ motif were simulated, and (iii) the lengths of the simulated peptides were recorded. The length of a translation product of an ORF that contained no termination codons in the reading frame changed by a simulated indel was set to 0. In order to normalize difference of the number of entries of viral species in the database, the list of the motif site and the length of simulated peptides was then normalized per motif site by selecting only one entry (accession) with the longest predicted amino acid sequence following the simulated indel of a virus when multiple entries for the virus with the same length, and the same start and stop codon coordinates of the original ORF containing the motif exist.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.