3.4. Multivariate Data Analysis

CS Cláudia Andreia Teixeira dos Santos
RP Ricardo Nuno Mendes Jorge Páscoa
NP Nuria Pérez-del-Notario
JG José Maria González-Sáiz
CP Consuelo Pizarro
JL João Almeida Lopes
request Request a Protocol
ask Ask a question
Favorite

The following nine compounds in wine were considered: isoamyl alcohol, isobutanol, 1-hexanol, butyric acid, isobutyric acid, decanoic acid, ethyl acetate, furfural and acetoin. Before using the spectra, it was necessary to ensure their validity. To this purpose, a principal component analysis (PCA) model was created to identify possible outliers. This model was developed for the spectral regions between 1580 and 1201 cm−1 and between 3050 and 2601 cm−1, considering Savitzky–Golay first-order derivative spectra. The resulting PCA model encompassed three principal components that accounted for 95.6% of the total variance in the considered regions. From the analysis of the PCA model residuals (sum of squared residuals) and Hotelling’s T2 (weighted sum of squared scores) statistics, two samples were considered outliers. These samples were excluded from the sample sets.

Calibration models were built based on partial least squares (PLS) regression using the PLS-1 algorithm [31], by regressing processed spectral data against the corresponding concentration values obtained through controlled additions.

After developing the calibration models for each compound, it was crucial to evaluate the models and determine their predictive ability. One of the most commonly used procedures is based on the root mean square error of calibration (RMSEC) (Equation (1)).

In Equation (1), yi represents the experimental measurement result for sample i, ŷi denotes the model prediction for that sample and N stands for the number of samples used for calibration [32].

The calibration models underwent cross-validation using the leave-one-out technique, and subsequently, their performance was assessed using the root mean square error of cross-validation (RMSECV) according to Equation (1), where ŷi represents the value predicted by the cross-validated model for sample i. This step also facilitated the selection of the optimal number of PLS factors, aiming for the lowest RMSECV value [33]. This process involved utilizing only 70% of the global dataset. The calibration models were also evaluated according to the coefficient of determination for calibration (R2C) and cross-validation (R2CV). Subsequently, the developed models were tested with independent datasets, representing the remaining 30% of the global dataset selected randomly. After projecting the test sets onto the models, their performance was evaluated again by calculating the RMSEP (Root Mean Square Error of Prediction). This error was calculated with N representing the number of samples in the prediction set and ŷi representing the value obtained by the model for each sample i within the same sample set.

To express the previously defined errors as percentages, the root mean square errors (RMSE) were divided by the range of values of each respective dataset, as follows:

The coefficient of determination can also provide a useful indication of the accuracy of the models. As mentioned before, it was determined for each compound at each step of model development: R2C for the calibration, R2CV for the cross-validated models and R2P for the models tested with prediction datasets.

Additionally, the models’ predictive ability was assessed using the range error ratio (RER) and residual predictive deviation (RPD) parameters, as defined in Equations (3) and (4), respectively. These dimensionless parameters are considered indicators of a good model’s ability, as described in the literature [23,24].

To complete the evaluation of the results obtained, the limit of detection (LOD) was also calculated according to Equation (5):

Before the application of PCA and PLS, all datasets were subjected to mean centering. All calculations were performed using Matlab version 8.3 (MathWorks, Natick, MA, USA).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A