The impact of calibration on a model’s performance
Accuracy, log-loss, and Brier scores usually improve because of calibration. However, since the model calibration still involves approximately fitting a model to the calibration curve plotted on the held-out calibration dataset, it may sometimes worsen the accuracy or other performance metrics by small amounts. Nevertheless, the benefits of having calibrated probabilities in terms of giving us actual interpretable probability values that represent likelihood far outweigh the slight performance impact.
As discussed in Chapter 1, Introduction to Data Imbalance in Machine Learning, ROC-AUC is a rank-based metric, meaning it evaluates the model’s ability to distinguish between classes based on the ranking of predicted scores rather than their absolute values. ROC-AUC doesn’t make any claim about accurate probability estimates. Strictly monotonic calibration functions, which continuously increase or decrease without...