2.5. Evaluation metrics

Morgan E. Smith; Brajendra K. Singh; Michael A. Irvine; Wilma A. Stolk; Swaminathan Subramanian; T. Déirdre Hollingsworth; Edwin Michael

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.5. Evaluation metrics

MS Morgan E. Smith

BS Brajendra K. Singh

MI Michael A. Irvine

WS Wilma A. Stolk

SS Swaminathan Subramanian

TH T. Déirdre Hollingsworth

EM Edwin Michael

This method is extracted from research article: Epidemics, Mar 2017

Predicting lymphatic filariasis transmission and elimination dynamics using a multi-model ensemble framework

DOI: 10.1016/j.epidem.2017.02.006

Ask a question

Favorite

We used three statistical indices to measure the central tendencies of the distributions of errors between data and predictions made by the single-model and multi-model ensembles, and to compare their predictive performance: viz. the average error (AE), relative error (RE), and the relative root mean squared error (ReRMSE) (Ramin et al., 2012, Simidjievski et al., 2015a, Simidjievski et al., 2015b, Willmott and Matsuura, 2005). Let y_i be the observed baseline prevalence in age group i, N_k be the number of members comprising model k, and $x_{k n i}^{}$ be the model-predicted prevalence in age group i for member n of model k. The AE is calculated as:

whereas the RE is given by:

and the ReRMSE is calculated as:

where $\bar{y}$ represents the overall mean mf prevalence. The AE and RE measure the average difference (as an absolute and relative measure, respectively) between model predictions and observed values, whereas the ReRMSE normalizes these differences by the standard deviation of the system variable (mf prevalence), which thus allows comparisons of model performances either for system variables measured on different scales or for sets of data exhibiting considerable between-study variance in the measured system variable. Smaller values of ReRMSE indicate better predictive performance (Simidjievski et al., 2015a, Simidjievski et al., 2015b).

To test if the power of the multi-model ensemble is based on the diversity of the constituent single-model ensembles, we measured the diversity of the single-model ensembles and correlated it to the performance improvement of the multi-model ensemble over the single-model ensembles (Simidjievski et al., 2015a, Simidjievski et al., 2015b). To quantify the diversity of the single models, we measured the average pairwise difference of the single-model members:

where |N_k| denotes the number of members in the single-model ensemble k, I the number of measurements in the mf age-profile data set, m₁ and m₂ two members in the single-model ensemble k, and x_km₁i and $x_{k m_{2} i}^{}$ the simulated values of these outcomes at age-group i.

To assess the performance improvement of the multi-model ensemble over a single-model ensemble, we calculate:

where MM and SM, respectively, represent multi-model and single-model ensembles.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol