We used three statistical indices to measure the central tendencies of the distributions of errors between data and predictions made by the single-model and multi-model ensembles, and to compare their predictive performance: viz. the average error (AE), relative error (RE), and the relative root mean squared error (ReRMSE) (Ramin et al., 2012, Simidjievski et al., 2015a, Simidjievski et al., 2015b, Willmott and Matsuura, 2005). Let yi be the observed baseline prevalence in age group i, Nk be the number of members comprising model k, and be the model-predicted prevalence in age group i for member n of model k. The AE is calculated as:
whereas the RE is given by:
and the ReRMSE is calculated as:
where represents the overall mean mf prevalence. The AE and RE measure the average difference (as an absolute and relative measure, respectively) between model predictions and observed values, whereas the ReRMSE normalizes these differences by the standard deviation of the system variable (mf prevalence), which thus allows comparisons of model performances either for system variables measured on different scales or for sets of data exhibiting considerable between-study variance in the measured system variable. Smaller values of ReRMSE indicate better predictive performance (Simidjievski et al., 2015a, Simidjievski et al., 2015b).
To test if the power of the multi-model ensemble is based on the diversity of the constituent single-model ensembles, we measured the diversity of the single-model ensembles and correlated it to the performance improvement of the multi-model ensemble over the single-model ensembles (Simidjievski et al., 2015a, Simidjievski et al., 2015b). To quantify the diversity of the single models, we measured the average pairwise difference of the single-model members:
where |Nk| denotes the number of members in the single-model ensemble k, I the number of measurements in the mf age-profile data set, m1 and m2 two members in the single-model ensemble k, and xkm1i and the simulated values of these outcomes at age-group i.
To assess the performance improvement of the multi-model ensemble over a single-model ensemble, we calculate:
where MM and SM, respectively, represent multi-model and single-model ensembles.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.