Chapter 9: K-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosted Regression
As is true for support vector machines, K-nearest neighbors and decision tree models are best known as classification models. However, they can also be used for regression and present some advantages over classical linear regression. K-nearest neighbors and decision trees can handle nonlinearity well and no assumptions regarding the Gaussian distribution of features need to be made. Moreover, by adjusting our value of k for K-nearest neighbors (KNN) or maximal depth for decision trees, we can avoid fitting the training data too precisely.
This brings us back to a theme from the previous two chapters – how to increase model complexity, including accounting for nonlinearity, without overfitting. We have seen how allowing some bias can reduce variance and give us more reliable estimates of model performance. We will continue to explore that balance in this chapter.
Specifically,...