The model discussed in the previous section is a simple model; it achieves 96.0 percent accuracy of the whole data. However, this evaluation is almost certainly overly optimistic. We used the data to define what the tree would look like, and then we used the same data to evaluate the model. Of course, the model will perform well on this dataset as it was optimized to perform well on it. The reasoning is circular.
What we really want to do is estimate the ability of the model to generalize to new instances. We should measure its performance in instances that the algorithm has not seen at training. Therefore, we are going to do a more rigorous evaluation and use held-out data. To do this, we are going to break up the data into two groups: in one group, we'll train the model, and in the other, we'll test the one...