Debugging data
You'll remember that back in the first chapter of this book, we discussed how machine learning models are a function of their training data, meaning that, for example, bad data will lead to bad models, or as we put it, garbage in, garbage out. If your project is failing, your data is the most likely culprit. Therefore, in this chapter we will start by looking at the data first, before moving on to look at the other possible issues that might cause our model to crash.
However, even if you have a working model, the real-world data coming in might not be up to the task. In this section, we will learn how to find out whether you have good data, what to do if you have not been given enough data, and how to test your data.
How to find out whether your data is up to the task
There are two aspects to consider when wanting to know whether your data is up to the task of training a good model:
Does the data predict what you want it to predict?
Do you have enough data?
To find out whether your...