The pseudo amino acid composition (PseAAC), proposed by Chou [31], is an efficient and widely used method to convert a protein sequence into a feature vector for developing different predictors based on machine learning algorithms [32–34]. In this work, we adopted the type-II PseAAC to represent protein samples. This method contains amino acid dipeptide composition as well as the correlation of physicochemical properties between two residues. Accordingly, each BLP (or non-BLP) sequence sample can be denoted as a 202 + nλ dimensional vector which is formulated as follows:
where n is the number of amino acid physicochemical properties considered, including hydrophobicity, hydrophilicity, mass, pK1, pK2, pI, rigidity, flexibility, and irreplaceability, which has been used in [35]; thus, n = 9 here. Since first six properties have been widely used in protein bioinformatics, we will briefly discuss the latter three properties: rigidity, flexibility, and irreplaceability. The rigidity and flexibility of amino acid side chains have been pointed out by Gottfries et al. [36] that it was a key for forming polypeptides and local protein domains associated with protein property alterations. Moreover, the rigidity and flexibility properties of sequences were used to predict conformation and protein fold changes and were verified by NMR measurement [37]. Besides, the degree of difficulty of residues' replacement is different in the evolution. Thus, the irreplaceability is a response to mutational deterioration in the course of the evolution of life [38]. The original values of nine physicochemical properties can be accessed at http://lin-group.cn/server/iBLP/download.html. λ represents the rank of correlation. xu (u = 1, 2, ⋯, 400 + nλ) stands for the frequencies for each element and can be calculated as follows:
where fμ represents frequency of the 400 dipeptides, ω is the weight factor for sequence order effect and its detailed information, and φu represents the j-tier sequence correlation factor of the physicochemical properties between residues. Given that this method has been commonly used and its detailed definition of more parameters could be found elsewhere [32], we do not reiterate them here.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.