In a conformal prediction setting, the observed error rate of predictions is theoretically proven to not be larger than the specified significance level. In return, any deviations between these values may indicate data drifts (or other causes for the deviations, such as a too small test set). The level of calibration can be visualised in a so-called calibration plot, where the observed error rate (y-axis) is plotted versus the significance level (desired error rate, x-axis). For valid (well-calibrated) models the values should lie on the diagonal line. Deviations from this behaviour signals deviations from perfect calibration. We also include efficiency in the plot, calculated as the fraction of single-class predictions. These plots, from hereon called calibration and efficiency plots (CEPs), were used in this work to assess the model calibration and efficiency (see Fig. Fig.2).2). As a measure of the level of calibration, we use the root-mean-square deviation (RMSD) between the specified significance and the observed error rate.
Calibration and efficiency plot. The dark lines show the mean error rate for the active (dark red) and inactive (dark blue) compounds. For a well-calibrated model, the error rate ideally follows the dashed diagonal line. The light coloured lines illustrate the mean efficiencies expressed as ratio of single label sets for the active (light red) and inactive (light blue) compounds. The shaded areas indicate the respective standard deviations within the fivefold CV. Class 0: inactive compounds, class 1: active compounds
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.