Abstract
Hypothetical proteins (HP) are those that are not characterized in the laboratory and so remain “orphaned” in genomic databases. In recent times there has been a lot of progress in characterizing HPs in the laboratory. Various methods, such as sequence capture and Next Generation Sequencing (NGS), have been used to rapidly identify HP functions and their encoded genes. Applications and methods, such as the isolation of single genes, are greatly facilitated by pull-down assays to characterize proteins. Furthermore, there are methods to extract proteins from either the whole cell or a subcellular fraction. But the weakness is that some assays are fairly expensive and laborious, and characterizing HP function is always imperfect. In the recent past, statistical interpretations of the in silico selection strategies have improved the identification of the most promising candidates, including those from various annotation methods, such as protein interaction networks (PIN). Given the improvements in technology that have permitted a substantial increase in computational annotation, we ask if the prediction of HP function in silico (validation of models through algorithms and data subsets) could likewise be improved. In this work, we apply a bioinformatics analogy to each step of a wet lab experiment performed to predict aspects confirming protein function. Although it may be a less bona fide approach, assigning a putative function from conservation observed in homologous protein sequences might be worthwhile to consider prior to a wet lab experiment.
Keywords: Hypothetical proteins, Omics, Systems biology, Functional genomics, Annotation
Procedure
Experiment steps and bioinformatics analogies
Representative data (example)
In a framework for functional prediction (Figure 1), experimentally determined characteristics of the putative interaction partners are perused to make an interactome of hypothetical proteins (hypothome (Desler et al., 2014)). In this process, we suggest a role for the predicted protein in a biological context, thus complementing an interactome with the interactions with predicted proteins, in addition to retaining information on interactions, whether predicted or experimentally verified (left panel in Figure 1). This strategy is essential for characterization of predicted proteins and their interactions with existing biological pathways. Furthermore, the electronic annotation using methods [described in Benso et al. (2013)] containing similar, yet non-interacting proteins (similactors) (right panel in Figure 1), along with the hypothome data, can be used in training datasets. However, a simulation followed by machine learning predictions can also be applied on a wide number of proteins not specific to HPs alone, thereby drawing an inference for an analogy to functional prediction. Figure 1. A framework for functional prediction. Experimentally determined characteristics of the putative interaction partners are perused to make an interactome of hypothetical proteins. Left panel: methods for making an interactome of hypothetical proteins as described by Desler et al. (2014). Right panel: electronic annotation methods described by Benso et al. (2013).
Acknowledgments
We would like to gratefully acknowledge Alfredo Benso and his colleagues for proposing similactors approach alongside hypothome. The authors received no funding whatsoever. PS would like to thank Arsalan Daudi and Fanglian He for inviting us to write this manuscript.
References
If you have any questions/comments about this protocol, you are highly recommended to post here. We will invite the authors of this protocol as well as some of its users to address your questions/comments. To make it easier for them to help you, you are encouraged to post your data including images for the troubleshooting.