Summary
In this chapter on the decision trees, we first tried to understand the structure and the meaning of a decision tree. This was followed by a discussion on the mathematics behind creating a decision tree. Apart from implementing a decision tree in Python, the chapter also discussed the mathematics of related algorithms such as regression trees and random forests. Here is a brief summary of the chapter:
A decision tree is a classification algorithm used when the predictor variables are either categorical or continuous numerical variables.
Splitting a node into subnodes so that one gets a more homogeneous distribution (similar observations together), is the primary goal while making a tree.
There are various methods to decide which variable should be used to split the node. These methods include information gain, Gini, and maximum reduction in variance methods.
The method of building a regression tree is very similar to a decision tree. However, the target variable in the case of a regression...