Model selection
What are we to make of all this? We have the confusion matrices from our models to guide us, but we can get a little more sophisticated when it comes to selecting the classification models. An effective tool for a classification model comparison is the Receiver Operating Characteristic (ROC) chart. Very simply, ROC is a technique for visualizing, organizing, and selecting the classifiers based on their performance (Fawcett, 2006). On the ROC chart, the y-axis is the True Positive Rate (TPR) and the x-axis is the False Positive Rate (FPR). The following are the calculations, which are quite simple:
- TPR = Positives correctly classified / total positives
- FPR = Negatives incorrectly classified / total negatives
Plotting the ROC results will generate a curve, and thus, you are able to produce the Area Under the Curve (AUC). The AUC provides you with an effective indicator of performance and it can be shown that the AUC is equal to the probability that the observer will correctly...