# Also in the Article

2.5. Model Tuning and Validation
This protocol is extracted from research article:
Application of Low-Cost MEMS Spectrometers for Forest Topsoil Properties Prediction
Sensors (Basel), Jun 7, 2021;

Procedure

Model calibration was done by means of a ten-fold cross validation and independent validation was performed using parts of the data as test set. Accordingly, we split our data into a calibration and validation set using a ratio of 70:30. In this way, 70% of the data is used to calibrate the model. The remaining 30% are held back and used for independent validation of the prediction performance after the regression models are calibrated.

We calibrated models for Oh and Ah horizons for the regional BZE data and local Zellwald data separately. For the regional BZE data, we also carried out an approach using samples from both horizons. This can be reasonable as the separation of the horizons can be difficult during the sampling. In this case, soil samples of the two horizons from the same sampling point were always kept together as spatial dependence between the horizons my otherwise led to overoptimistic results. The accuracy of the selected algorithms for regression analysis can be improved by tuning specific hyperparameters. In case of PLSR, the number of used components has to be optimized. For Cubist regression, we have to select the ideal number of committees and neighbours. To ensure a robust selection of values for the hyperparameters of each approach, we used a 10-fold cross validation during the model calibration. In this procedure, we randomly split the data into ten subsets of the same size. Again, samples from the same point were kept in the same splits. Then, model calibration is performed with nine of the subsets, and the remaining subset is used for validation of the model. This procedure is repeated ten times, until each of the ten subsets was used as validation for the model performance once. By this, in total ten models are built, always using different parts of the training data and thus resulting in ten estimates of the model performance using all the data in the calibration set. The estimated prediction errors of all ten models is then combined [44]. This ten-fold cross validation was repeated several times, using different values for the hyperparameters of the algorithms. The tuning was done by a grid-search. Regarding PLSR, one up to 20 components were used to optimize the results. For Cubist, we selected the ideal combination of hyperparameters while searching between values of 1 and 50 for committees, values for neighbours ranged from zero to nine. The values resulting in the lowest error were then chosen and used to evaluate the performance on the test data set, which consists of data the model has not seen yet. For regression model calibration, log-transformed values of C and N content were used. All values were back-transformed prior to model performance validation. For assessing the models performance on the test data, we used different error measures. To evaluate the deviation of the predicted from the measured values, we used the root mean squared error (RMSE). The RMSE is computed as can be seen in Equation (3):

where $yi$ are the predicted, $xi$ are the observed values.

Apart from error measures based on the deviations of the predictions to the actual observed values, overall model performance was assessed using the coefficient of determination, $R2$, described in Equation (4):

where y are the observed, $y^$ are the predicted values and $y¯$ is the mean value of the observed values.

We computed the ratio of performance to interquartile distance in addition. It is calculated as follows in Equation (5) [49]:

where $Q1$ and $Q3$ are the first and the third quantile of the observed values.

Chang et al. [50] suggested a system to rank RPD values. We adapted it for RPIQ (transformation was done by multiplying the values with 1.34896 (as the interquartile range of a Gaussian distribution equals 1.34896 × SD)). Model with RPIQ > 2.70 are considered good, models with 1.89 < RPIQ < 2.70 are moderate and the threshold for poor performance is RPIQ < 1.89.

In general, the models accuracy is indicated by low RMSE and high $R2$ and RPIQ values. All calculations were carried out by means of the R language for statistical computing [36] and the caret package for classification and regression training [51].

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A