Supervised learning algorithms work by extracting knowledge from a knowledge base (KB), that is, the dataset that contains labeled instances of the concept we need to learn about.
Supervised learning algorithms are two-phase algorithms. Given a supervised learning problem—let's say, a classification problem—the algorithm tries to solve it during the first phase, called the training phase, and its performance is measured in the second phase, called the testing phase.
The three dataset splits (train, validation, and test), as defined in the previous section, and the two-phase algorithm should sound an alarm: why do we have a two-phase algorithm and three dataset splits?
Because the first phase (should—in a well-made pipeline) uses two datasets. In fact, we can define the stages:
- Training and validation: The algorithm analyzes the dataset...