Model evaluation strategies come in many different forms and shapes. In the following sections, we will, therefore, highlight three of the most commonly used techniques to compare models against each other:
- k-fold cross-validation
- Bootstrapping
- McNemar's test
In principle, model evaluation is simple: after training a model on some data, we can estimate its effectiveness by comparing model predictions to some ground truth values. We learned early on that we should split the data into training and test sets, and we tried to follow this instruction whenever possible. But why exactly did we do that again?