Definition of a reliable gene set by integrating the results from different gene prediction tools.

YM Yiqing Mao
XY Xianwei Yang
YL Yang Liu
YY Yanfeng Yan
ZD Zongmin Du
YH Yanping Han
YS Yajun Song
LZ Lei Zhou
YC Yujun Cui
RY Ruifu Yang
ask Ask a question
Favorite

Gene prediction is fundamental to genome annotation, but different tools usually generate different results because they use different algorithms. The widely used gene prediction software GLIMMER is based on an interpolated Markov model. GeneMarkS uses a hidden Markov model and an iterative self-learning algorithm for gene prediction, whereas Prodigal is based on a dynamic programming algorithm. Different versions of the same software may generate different prediction results. For example, GLIMMER 1.061 was first released in 1998. In 2007, the GLIMMER 3.062 update contained some major improvements compared with the original version, such as supporting longer open reading frames, as well as ribosome binding site and overlapping gene predictions. In addition, the accuracy of the genome annotation results is affected by the parameter settings and database updates.

The prediction results for Y. pestis genomes using GLIMMER (threshold: overlap number = 1, gene length > 100, score > 30), GeneMarkS (parameter: prok, combine mode), and Prodigal varied, especially regarding the positions of TISs (Supplemental Table 4). Therefore, a predicted gene was defined as highly reliable only when the three prediction tools presented fully consistent results (i.e., both the predicted 3′ and 5′ ends of the gene were consistent). In addition, this reliable gene set was used to adjust the positions of the TISs of 2,302 genes.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A