2.2.4. Pseudo Amino Acid Composition (PseAAC)
This protocol is extracted from research article:
iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins
Comput Math Methods Med, Jan 7, 2021; DOI: 10.1155/2021/6664362

The pseudo amino acid composition (PseAAC), proposed by Chou [31], is an efficient and widely used method to convert a protein sequence into a feature vector for developing different predictors based on machine learning algorithms [3234]. In this work, we adopted the type-II PseAAC to represent protein samples. This method contains amino acid dipeptide composition as well as the correlation of physicochemical properties between two residues. Accordingly, each BLP (or non-BLP) sequence sample can be denoted as a 202 + dimensional vector which is formulated as follows:

where n is the number of amino acid physicochemical properties considered, including hydrophobicity, hydrophilicity, mass, pK1, pK2, pI, rigidity, flexibility, and irreplaceability, which has been used in [35]; thus, n = 9 here. Since first six properties have been widely used in protein bioinformatics, we will briefly discuss the latter three properties: rigidity, flexibility, and irreplaceability. The rigidity and flexibility of amino acid side chains have been pointed out by Gottfries et al. [36] that it was a key for forming polypeptides and local protein domains associated with protein property alterations. Moreover, the rigidity and flexibility properties of sequences were used to predict conformation and protein fold changes and were verified by NMR measurement [37]. Besides, the degree of difficulty of residues' replacement is different in the evolution. Thus, the irreplaceability is a response to mutational deterioration in the course of the evolution of life [38]. The original values of nine physicochemical properties can be accessed at http://lin-group.cn/server/iBLP/download.html. λ represents the rank of correlation. xu (u = 1, 2, ⋯, 400 + ) stands for the frequencies for each element and can be calculated as follows:

where fμ  represents frequency of the 400 dipeptides, ω is the weight factor for sequence order effect and its detailed information, and φu represents the j-tier sequence correlation factor of the physicochemical properties between residues. Given that this method has been commonly used and its detailed definition of more parameters could be found elsewhere [32], we do not reiterate them here.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.