How do I know whether this model will be accurate?
As decision-makers, you need to be confident that the machine learning models you’re using provide you with reliable, accurate predictions or insights. However, how can you be sure? What metrics should you use to evaluate your models? And what do these metrics really mean?
Let’s attempt to understand how metrics are used to evaluate machine learning models and look at some common examples.
Evaluating on test (holdout) data
Before we get into the specifics around the different types of evaluation metrics, first, you need to understand the importance of evaluating on test (a.k.a. holdout) data.
One very important aspect of model evaluation is the use of holdout (or test) data. This is a subset of your data that the model hasn’t seen during training or validation. Evaluating your model on holdout data gives you a more realistic estimate of its performance in the real world.
This test data should follow...