Partial least squares regression (PLSR) was used on the calibration set to obtain the model that was subsequently used to predict tomatine content in industrial tomato samples. Additionally, in this case, tomatine concentration values were converted into the logarithmic form to make the distribution fairly symmetrical.
A kernel PLS algorithm was used to correlate the logarithm of tomatine concentration and spectral features transformed as mentioned previously [47]. The best number of latent variables that was used in the regression function was estimated for the cross-validation procedure, taking into account the cross-validation root mean square error (RMSECV) trend with respect to the number of retained components. In the present work, only three components were necessary to explain the majority of the total X- and Y-variance. Once the optimal number of latent variables were chosen, Martens’ uncertainty test [48] was performed in order to remove unimportant information within spectral data. This powerful tool allowed us to improve the predictability of the model, retaining only the significant variable by giving a more reliable estimate of the prediction error when the model was tested on new samples. Moreover, since a reduced number of spectral variables were used, a simpler model was generated.
In Martens’ uncertainty test, the regression coefficients Bi for each cross-validation sub-model, chosen with the venetian blind option, were calculated and the differences from the regression coefficient of the total model, Btot, were computed. The sum of the squares of the differences in all sub-models was finally evaluated in order to obtain an expression of the variance of the Bi estimate for a specific wavelength. With a t-test, the significance (confidence level of 95%) of the estimate of Bi was calculated and the resulting regression coefficients were presented with uncertainty limits. Variables with uncertainty limits that did not contain the zero were significant variables. This procedure was iteratively repeated until the difference between RMSEC and RMSECV reached a minimum value.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.