Data preparation
Before an algorithm can be applied to a dataset, the dataset needs to fit a certain pattern. The dataset also needs to be free of errors. Certain methods and techniques are used to ensure that the dataset is ready for the algorithms, and this will be the focus of this section.
Supervised learning dataset
Since DataRobot mostly works with supervised learning problems, we will only focus on datasets for supervised machine learning (other types will be covered in a later section). In a supervised machine learning problem, we provide all the answers as part of the dataset. Imagine a table of data where each row represents a set of clues with their corresponding answers (Figure 2.1):
This dataset is made up of columns that contain clues (these are called features), and there is a column with the answers (this is called target). Given a dataset that looks like this, the algorithm learns how to...