2.2.3. g-gap Dipeptide Composition (g-gap DC)
This protocol is extracted from research article:
iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins
Comput Math Methods Med, Jan 7, 2021; DOI: 10.1155/2021/6664362

The amino acid composition (AAC) and dipeptide composition (DC) encoding strategies have been widely used for protein prediction [2830]. However, they can only express the fraction of each amino acid type or the adjacent sequence-order information within a protein. In fact, the interval residues in primary sequence might be spatially closer in tertiary structure, especially in some regular secondary structures, such as alpha helix and beta sheet, which are two nonadjoining residues were connected by hydrogen bonds. In other word, it means that interval residues are more significant than the adjacent residues in biology. Hence, the g-gap dipeptide composition (g-gap DC) feature encoding strategy is proposed to calculate the frequency of amino acid pairs separated by any g residues.

And then, a protein P can be formulated by

where fig represents for the frequency of the i-th (i = 1, 2, 3, ⋯, 400) g-gap dipeptide and can be calculated by

where nig denotes the occurrence number of the i-th g-gap dipeptide and L is the length of protein P. Particularly, when g = 0, the g-gap DC method is equal to adjoining DC.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.