Evaluating classification models
Now that we have fit a classification model, we can examine the accuracy on the test set. One common tool for performing this kind of analysis is the Receiver Operator Characteristic (ROC) curve. To draw an ROC curve, we select a particular cutoff for the classifier (here, a value between 0
and 1
above which we consider a data point to be classified as a positive, or 1) and ask what fraction of 1s are correctly classified by this cutoff (true positive rate) and, concurrently, what fraction of negatives are incorrectly predicted to be positive (false positive rate) based on this threshold. Mathematically, this is represented by choosing a threshold and computing four values:
TP = true positives = # of class 1 points above the threshold FP = false positives = # of class 0 points above the threshold TN = true negatives = # of class 0 points below the threshold FN = false negatives = # of class 1 points below the threshold
The true positive rate (TPR) plotted...