Partial least squares—discriminant analysis (PLS-DA)

GN Gabrielle Nepomuceno
CJ Carolina Victoria Cruz Junho
MC Marcela Sorelli Carneiro-Ramos
HM Herculano da Silva Martinho
request Request a Protocol
ask Ask a question
Favorite

All spectra were pre-processed to become comparable for the statistical analysis. The baseline was corrected using the least-squares polynomial curve fitting method as described by Lieber and Mahadevan-Jansen17. All spectra were normalized to mean and scaled using Pareto’s scaling18.

Then PLS-DA analysis was performed. PLS is a multivariate supervised method that uses linear regression of original variables to predict the class membership (Sham, 8D, 15D for heart and kidney groups). In our case the PLS regression was performed using the plsr function provided by R pls package16,19. The classification and cross-validation were performed using the corresponding wrapper function using the caret package19. A permutation test was performed to assess the performance of class discrimination. In each permutation, a PLS-DA model was built between the data and the permuted class labels using the optimal number of components determined by leave-one-out cross validation for the model based on the original class assignment. The class discrimination performance was measured using classification accuracy, R2, and Q2 parameters. The first one is based on prediction accuracy. The R2 parameter is the “goodness of fit” or explained variation which is based on the ratio of the between group sum of the squares and the within group sum of squares. On the other hand, Q2 is the “goodness of prediction”, or predicted variation, calculated from cross validation. In each round, the predicted data are compared with the original data, and the sum of squared errors is calculated being then summed over all samples (Predicted Residual Sum of Squares or PRESS). For convenience, the PRESS is divided by the initial sum of squares and subtracted from 1 to resemble the scale of the R2. Good predictions will have low PRESS or high Q2 while negative Q2 means that model is not at all predictive or is overfitted2022.

Two quantifiers were used to measure the vibrational band frequency importance in PLS-DA model. The first, Variable Importance in Projection (VIP) is a weighted sum of squares of the PLS loadings taking into account the amount of explained spectral intensity-variation in each dimension. The other importance measure is based on the weighted sum of PLS-regression. The weights are a function of the reduction of the sums of squares across the number of PLS components. For multiple-group analysis, the same number of predictors will be built for each group and the average of the feature coefficients were used to indicate the overall coefficient-based importance.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A