Classification trees and pruning
A classification tree is a particular type of decision tree, and its focus is mainly on classification problems. Breiman, et al. (1984) invented the decision tree and Quinlan (1984) independently introduced the C4.5 algorithm. Both of these had a lot in common, but we will focus on the Breiman school of decision trees. Hastie, et al. (2009) gives a comprehensive treatment of decision trees, and Zhang and Singer (2010) offer a treatise on the recursive partitioning methods. An intuitive and systematic R programmatic development of the trees can be found in Chapter 9, Ensembling Regression Models, of Tattar (2017).
A classification tree has many arguments that can be fine-tuned for improving performance. However, we will first simply construct the classification tree with default settings and visualize the tree. The rpart
function from the rpart
package can create classification, regression, as well as survival trees. The function first inspects whether the...