So far, we have covered in depth the first machine learning classifier and evaluated its performance by prediction accuracy. Beyond accuracy, there are several measurements that give us more insight and allow us to avoid class imbalance effects. They are as follows:
- Confusion matrix
- Precision
- recall
- F1 score
- AUC
A confusion matrix summarizes testing instances by their predicted values and true values, presented as a contingency table:
To illustrate this, we compute the confusion matrix of our Naïve Bayes classifier. Herein, the confusion_matrix function of scikit-learn is used, but it is very easy to code it ourselves:
>>> from sklearn.metrics import confusion_matrix
>>> confusion_matrix(Y_test, prediction, labels=[0, 1])
[[1102 89]
[ 31 485]]
Note that we consider 1, the spam class, to be positive. From the confusion...