2.2.2. Composition/Transition/Distribution (CTD)
This protocol is extracted from research article:
iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins
Comput Math Methods Med, Jan 7, 2021; DOI: 10.1155/2021/6664362

The composition, transition, and distribution (CTD) method was first proposed for protein folding class prediction by Dubchak et al. [24] in 1995. These three descriptors composition (C), transition (T), and distribution (D) could be calculated according to the following two hypothesis: (i) the sequence of amino acids could be transformed into a sequence of certain structural or physicochemical properties of residues; (ii) according to the main clusters of the amino acid indices of Tomii and Kanehisa [25], twenty amino acids were divided into three groups based on each of the 13 different physicochemical attributes, including hydrophobicity, normalized Van der Waals volume, polarity, polarizability, charge, secondary structures, and solvent accessibility. The groups of amino acids are listed in Table 2, and the details of grouping criterion can be seen in the previous study [26]. Therefore, the three descriptors were used to describe the composition percentage of each group in the peptide sequence which could yield three features: the transition probability between two neighboring amino acids belonging to two different groups that also contained 3 features; the distribution pattern of the property along the position of sequence (the first, 25%, 50%, 75%, or 100%), which 5 features were obtained. Finally, based on the CTD method [27], a sample protein P can be formulated by (3 + 3 + 5) × 13 = 273 dimensional feature vector.

Amino acid physicochemical attributes used in CTD method and the three corresponding groups of amino acids according to each attribute.

