The mathematical relation between FTIR spectra and secondary structure content has been established as described earlier (De Meutter and Goormaghtigh 2021b). Briefly the ascending stepwise linear regression (ASLR) introduces, in an ascending stepwise manner, one absorbance at a time in a linear regression model (Goormaghtigh et al. 2006, 2009). Partial least square regression (PLS) is a multivariate approach that minimizes the number of latent variables (LVs) required for prediction (Geladi and Kowalski 1986; Wold et al. 2001). It was computed by the software running under Matlab developed by Norgaart et al. (Nørgaard et al. 2000; Leardi and Nørgaard 2005). Support Vector Machine (SVM) dedicated to solving regression problems (Tange et al. 2015; Ghorbani et al. 2016) was used according to the formulation introduced by Suykens et al., with the Matlab toolbox built by the authors (Pelckmans et al. 2002).
Two types of validations were obtained. Cross-validation was run in a leave-one-out mode, i.e., one protein spectrum at a time was removed from the training set and used to challenge the model obtained with all the other proteins. The quality of the prediction was computed as the root mean square standard error in cross-validation (RMSECV). This error was compared with the standard deviation of the secondary structure content (STDDEVREFCV) by computing ζCV = STDDEVREFCV/RMSECV (Oberg et al. 2004; Kinalwa et al. 2010). ζ indicates how much better the prediction is with respect to guessing the mean values is the prediction. For instance, a value of ζ = 3 for the α-helix whose content distribution in cSP92 is characterized by STDDEVREFCV = 18.3% means that the error of prediction is 6.1%. When ζ is close to 1, it indicates spectroscopy does not bring much added value to secondary structure prediction. It must be noted that ζ is related to the correlation coefficient (Fearn 2002).
A second calibration used a single subset of the cSP92 protein spectra as test set. The Kennard–Stone algorithm (Kennard and Stone 1969) was used to select one third of the spectra with a uniform distribution of the secondary structure content. The quality of the prediction was judged from the root mean square error of prediction for the Kennard–Stone selected test set (RMSEKS) and ζKS was computed as STDDEVREFKS/RMSEKS. It must be noted that STDDEVREFCV is different from STDDEVREFKS.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.