The reported optimum similarity indices selected by the accordance between spectral and compositional similarity were evaluated in terms of their predictive power. For each sample in the test dataset, all six properties were predicted by PLSR models based on similar samples matched in the training dataset using the five similarity indices.
The number of similar samples selected from the training dataset has a great effect on the model performance [19], which although important, was not the focus of the research here. Different sizes (n = 5, 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 400 and 500) were tested for prediction of SOC. Model performance achieved the highest predictive accuracy and stabilized around ~ 250; thus, this size was selected for all subsequent analyses of other physiochemical properties (S1 Fig). PLSR model performance was evaluated by the ratio of percent deviation (RPD; Eq 7):
where SD is the standard deviation of the observed property values for the test dataset, and RMSEP is the RMSE of the prediction (see Eq 4). For each sample in the test dataset, the most similar 250 samples in spectral space (as evaluated by different similarity indices) were selected from the training dataset to build the PLSR models for prediction of the soil properties. We followed the criteria proposed by Chang and Laird [20] to evaluate the performance of the PLSR models: (1) RPD < 1.4, the model is not able to predict the target property; (2) 1.4 ≤ RPD < 2.0, moderate model predictive performance; and (3) 2.0 ≤ RPD < 2.5, the model can predict the target property well.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.