2.4. Multivariate Statistics

MV Maria Olga Varrà
MC Mauro Conter
MR Matteo Recchia
GA Giovanni Loris Alborali
AM Antonio Marco Maisano
SG Sergio Ghidini
EZ Emanuela Zanardi
ask Ask a question
Favorite

Multivariate analysis, consisting of Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA), was then applied to the SNV + 4Der pre-processed spectral data in order to develop classification models that could discriminate between the different categories of lung tissues. More specifically, OPLS-DA was used to establish a correlation between the NIR spectral data and the results from the gross anatomopathological diagnosis performed by veterinarians (serving as the reference standard for assigning classes to the lung samples).

Specifically, two separate OPLS-DA models were developed:

Before the construction of the models, the whole spectral dataset was prepared by randomly splitting it into calibration (75%) and validation subsets (25%). This procedure was performed both for the OPLS-DA Model 1 and Model 2. By consequence, the calibration sets, containing the majority of the spectra, included N = 974 spectra for Model 1 and N = 441 spectra for Model 2. The calibration sets were employed for model training and development, which were performed by using a 7-fold cross-validation.

The validation sets, including the remaining spectra not used for calibration (i.e., N = 324 spectra for Model 1 and N = 147 spectra for Model 2), were instead employed for independent testing (external validation) of the developed calibration OPLS-DA models. They served as unbiased sample sets to evaluate the overall performances of the models in discriminating N vs. C vs. P or FPP vs. CBP vs. IP lung tissues when applied to new unknown lung samples.

Following the external validation stage, confusion matrices were generated for both Model 1 and Model 2 to summarize the predicted vs. the actual classifications of the samples included in the two validation spectral subsets. Starting from the confusion matrices, true-positive, true-negative, false-positive, and false-negative samples for each class of lung samples were identified. These values were then used to calculate specificity, sensitivity, accuracy, and precision percentage values, providing comprehensive performance metrics for the discriminant models [31].

Finally, to identify the crucial NIR wavelengths that had a significant impact on predicting the class membership of the lung samples, the Variable Importance in Projection (VIP) index was employed, where VIP values equal to or greater than 1 were considered significant [32].

Multivariate analysis was performed by using SIMCA® software v. 17.0.2.34594 (Sartorius Stedim Data Analytics AB, Umea, Sweden).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A