Key concepts
Decision trees are an exceptionally useful machine learning tool. They are non-parametric, easy to interpret, and can work with a wide range of data. No assumptions regarding linearity of relationships between features and targets, and normality of error terms, are made. It isn’t even necessary to scale the data. Decision trees also often do a good job of capturing complex relationships between predictors and targets.
The flexibility of the decision tree algorithm, and its ability to model complicated and unanticipated relationships in the data, is due to the recursive partitioning procedure that’s used to segment the data. Decision trees group observations based on the values of their features. This is done with a series of binary decisions, starting from an initial split at the root node, and ending with a leaf for each grouping. Each split is based on the feature, and feature values, that provide the most information about the target. More precisely...