Summary
In this chapter, we introduced decision trees as a particular kind of classifier. The basic idea behind their concept is that a decision process can become sequential by using splitting nodes, where, according to the sample, a branch is chosen until we reach a final leaf. In order to build such a tree, the concept of impurity was introduced; starting from a complete dataset, our goal is to find a split point that creates two distinct sets that should share the minimum number of features and, at the end of the process, should be associated with a single target class. The complexity of a tree depends on the intrinsic purity—in other words, when it's always easy to determine a feature that best separates a set, the depth will be lower. However, in many cases, this is almost impossible, so the resulting tree needs many intermediate nodes to reduce the impurity until it reaches the final leaves.
We also discussed some ensemble learning approaches: random forests, AdaBoost, gradient tree...