Evaluating classification models
In the previous recipe, we learned how to choose our training hyperparameters to optimize our training. We also verified how those choices affected the training and validation losses. In this recipe, we are going to explore how those choices affect our actual evaluation in the real world. You will have noticed that we split the dataset into three different sets: training, validation, and test sets. However, during our training, we only used the training set and the validation set. In this recipe, we will emulate real-world behavior by using the unseen data from our model, the test set.
Getting ready
When evaluating a model, we can perform qualitative evaluation and quantitative evaluation.
Qualitative evaluation is the selection of one or more random (or not so random, depending on what we are looking for) samples and analyzing the result, verifying whether it matches our expectations.
In this recipe, we will compute the evaluation metrics...