Viral sequence analysis and simulation of translation products produced by TS

YH Yuka Hagiwara-Komoda
SC Sun Hee Choi
MS Masanao Sato
GA Go Atsumi
JA Junya Abe
JF Junya Fukuda
MH Mie N. Honjo
AN Atsushi J. Nagano
KK Keisuke Komoda
KN Kenji S. Nakahara
IU Ichiro Uyeda
SN Satoshi Naito
request Request a Protocol
ask Ask a question
Favorite

Full-length genomic sequences of RNA viruses were downloaded in the GenBank format from the NCBI website (http://www.ncbi.nlm.nih.gov/). The sequences were semi-automatically curated to exclude non-RNA viral genomes using a custom Perl script and by manual inspection. ORFs with annotations for noncanonical codon usage or in an unconventional translational manner, such as ribosomal frameshift were excluded. Next, using a custom Perl script, (i) sequences of ORFs with G1–2A6+ and G0A6+ motifs were parsed, (ii) translation products produced from RNA genomes with +1 or –1 base indels at each G1–2A6+ or G1A6+ motif were simulated, and (iii) the lengths of the simulated peptides were recorded. The length of a translation product of an ORF that contained no termination codons in the reading frame changed by a simulated indel was set to 0. In order to normalize difference of the number of entries of viral species in the database, the list of the motif site and the length of simulated peptides was then normalized per motif site by selecting only one entry (accession) with the longest predicted amino acid sequence following the simulated indel of a virus when multiple entries for the virus with the same length, and the same start and stop codon coordinates of the original ORF containing the motif exist.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A