Using decision trees for discretization
In all previous recipes in this chapter, we determined the number of intervals arbitrarily, and then the discretization algorithm would find the interval limits one way or another. Decision trees can find the interval limits and the optimal number of bins automatically.
Decision tree methods discretize continuous attributes during the learning process. At each node, a decision tree evaluates all possible values of a feature and selects the cut point that maximizes the class separation, or sample coherence, by utilizing a performance metric such as entropy or Gini impurity for classification, or the squared or absolute error for regression. As a result, the observations end up in certain leaves based on whether their feature values are greater or smaller than certain cut points.
In the following figure, we can see the diagram of a decision tree that is trained to predict house prices based on the property’s average number of rooms...