Summary
This chapter explored how to use decision trees for classification problems. Although the examples in this chapter all involved a binary target, the algorithms we worked with can also handle multiclass problems. Unlike the switch from logistic to multinomial logistic regression, few changes need to be made to use the algorithms well when our target has more than two values.
We looked at two approaches to dealing with the high variance of decision trees. One approach is to use a random forest, which is a form of bagging. This will reduce the variance in our predictions. Another approach is to use gradient-boosted decision trees. Boosting can help us capture very complicated relationships in the data, but there is a non-trivial risk of overfitting. It is particularly important to tune our hyperparameters with that in mind.
In the next chapter, we explore another well-known algorithm for classification: K-nearest neighbors.