Hyperparameters
In the previous section, we took a conceptual overview of how decision trees are constructed. In particular, we established that one of the criteria for determining where a decision tree should be split (in other words, when a new path should be added to our conceptual flowchart) is by looking at the purity of the resulting nodes. We also noted that allowing the algorithm to exclusively focus on the purity of the nodes as a criterion for constructing the decision tree would quickly lead to trees that overfit the training data. These decision trees are so tuned to the training data that they are not only capturing the most salient features for classifying a given data point but are even modeling the noise in the data as though it is a real signal. Therefore, while this kind of a decision tree that is allowed to optimize for specific metrics without restrictions will perform really well on the training data, it will neither perform well on the testing dataset nor generalize...