We conducted two trials to determine which regularized regression algorithms exhibited the best prediction performance and escaped overfitting to training datasets. In Trial 1, we randomly separated total datasets into training (80%) and test datasets (20%) to evaluate the prediction powers for interpolation. In Trial 2, the datasets derived from an article were used as test datasets, which resulted in no relationship between training and test datasets and enabled us to understand the predictability for extrapolation. Both trials adopted leave-one-out (LOO) cross-validation, in which temporal models were repeatedly generated using N-1 training datasets (N is the number of training datasets), and a remaining dataset was used for model validations. The coefficients were the averaged values of the generated N models. To evaluate the prediction performance, we calculated mean squared values (MSEs). We also generated a number of the predicted LRVs by changing, stepwise, some explanatory variables having a higher coefficient to explore the out-of-predictable range in which a model generated obviously wrong prediction values such as a negative LRV.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.