Monte Carlo Cross Validation (MCCV) and Y-shuffling were used to validate the models. For MCCV, one random point was removed from the training dataset and a multiple linear regression was performed incorporating the predictor variables of the original model. We then calculated the RMS error of the left out point to its true value using the newly generated model. This process was repeated 10,000 times and an average RMS error was calculated from all 10,000 repeats. We repeated this process but leaving out between two to five points in order to validate the robustness and overfitting of the original models to the training set data.
For Y-shuffling, three different protocols were used to verify the statistical significance of each model as well as its overall performance.17 The first protocol was to shuffle the RMS fluctuations at of each model and to perform new multiple linear regressions with the unshuffled predictor variables. The second protocol was to perform new multiple linear regressions of the unshuffled RMS fluctuations with randomly chosen predictor variables from the 1665 term matrix. The third protocol was to perform new multiple linear regressions of the unshuffled RMS fluctuations with random integers as predictor variables, chosen from a range between 0 and 32767. This process was repeated 10,000 times for each protocol, and statistical significance of the original models was determined by comparing the correlation coefficient of the original model to the correlation coefficient of the 1,000 best performing validation models.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.