Summary
In this chapter on the decision trees, we first tried to understand the structure and the meaning of a decision tree. This was followed by a discussion on the mathematics behind creating a decision tree. Apart from implementing a decision tree in Python, the chapter also discussed the mathematics of related algorithms such as regression trees and random forests. Here is a brief summary of the chapter:
- A decision tree is a classification algorithm used when the predictor variables are either categorical or continuous numerical variables.
- Splitting a node into subnodes so that one gets a more homogeneous distribution (similar observations together), is the primary goal while making a tree.
- There are various methods to decide which variable should be used to split the node. These methods include information gain, Gini, and maximum reduction in variance methods.
- The method of building a regression tree is very similar to a decision tree. However, the target variable in the case of a regression...