Due to the large number of correlated geographic variables, we used a dimension reduction technique to simplify the model selection process. We chose partial least squares (PLS) because it is specifically designed to deal with large sets of collinear variables and maximizes covariance between the predictors and outcome while minimizing overfitting via dimension reduction.15 Incorporating PLS regression into the prediction approach both avoids overfitting of models, and most effectively chooses predictors from a high dimensional geographic database.15 We performed PLS separately for each modeling year. We considered models including the first 2, 3, 4, and 5 PLS components, and then selected the optimal number of PLS components via cross-validated performance as described in section 2.7.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.