Decision trees
In the previous section, we computed the information gained for a given split. Recall that it's computed or calculated by computing the Gini impurity for the parent node in each LeafNode
. A higher information again is better, which means we have successfully reduced the impurities of the child nodes with our split. However, we need to know how a candidate split is produced to be evaluated.
For each split, beginning with the root, the algorithm will scan all the features in the data, selecting a random number of values for each. There are various strategies to select these values. For the general use case, we will describe and select a k random approach:
- For each of the sample values in each feature, we simulate a candidate split
- Values above the sampled value go to one direction, say left, and values above that go the other direction, that is, to the right
- Now, for each candidate split, we're going to compute the information gain, and select the feature value combination that...