Data preprocessing is the explicit process of ensuring that your data can be ingested into your algorithm simply. In this section, you will learn how to work with data for ML in future sections.
Data preprocessing
Getting ready
Why even worry about preprocessing? It's easy to overlook the easy steps. As we ingest data into our algorithms, we'll need to ensure that each of the data points is both useful and accurate. This means we need to ensure that both the X data and Y labels, in a supervised learning problem space, are correct prior to going to a learner. So, how do we ensure that each of the data points is correct? For large datasets, we can look at macro metrics such as a three sigma outlier. For smaller datasets...