Introduction to model calibration
What is the difference between stating “The model predicted the transaction as fraudulent” and “The model estimated a 60% probability of the transaction being fraudulent”? When would one statement be more useful than the other?
The difference between the two is that the second statement represents likelihood. This likelihood can be useful in understanding the model’s confidence, which is needed in many applications, such as in medical diagnosis. For example, the prediction that a patient is 80% likely or 80% probable to have cancer is more useful to the doctor than just predicting whether the patient has cancer or not.
A model is considered calibrated if there is a match between the number of positive classes and predicted probability. Let’s try to understand this further. Let’s say we have 10 observations, and for each of them, the model predicts a probability of 0.7 to be of the positive class...