For a well calibrated output, we expect proportion x of the output probabilities with Pr[AI] x to be true positives. It has been noted elsewhere (Guo et al., 2017) that CNNs may produce improperly calibrated probabilities. However, even if the probabilities are calibrated with respect to the validation dataset (which has even class ratios), this is unlikely to hold for empirical data, as the relative ratios of AI versus not-AI windows in the genome are very skewed.

We tested three calibration methods: beta regression (Kull et al., 2017), isotonic regression (Chakravarti, 1989), and Platt, 1999 scaling. To calibrate our CNN output, we first resampled our training dataset to the desired class ratios. We then fit each calibrator to predict the true class in the resampled training dataset from the CNN prediction for the resampled training dataset. To assess the calibration procedure, we inspected reliability plots for our calibrated and uncalibrated predictions, as evaluated with a resampled validation dataset (Figure 4—figure supplement 1, Figure 4—figure supplement 2, Figure 4—figure supplement 3, Figure 4—figure supplement 4). We also checked if the sum of the residuals was normally distributed, following the approach of Turner et al., 2019. Both beta calibration and isotonic regression gave well-calibrated probabilities compared with uncalibrated model outputs, and for our predictions on empirical data we chose to apply beta calibration due to its relative simplicity.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.