Random forests
The final classifier that we will be discussing in this chapter is the aptly named Random Forest and is an example of a meta-technique called ensemble learning. The idea and logic behind random forests follows.
Given that (unpruned) decision trees can be nearly bias-less high variance classifiers, a method of reducing variance at the cost of a marginal increase of bias could greatly improve upon the predictive accuracy of the technique. One salient approach to reducing the variance of decision trees is to train a bunch of unpruned decision trees on different random subsets of the training data, sampling with replacement—this is called bootstrap aggregating or bagging. At the classification phase, the test observation is run through all of these trees (a forest, perhaps?), and each resulting classification casts a vote for the final classification of the whole forest. The class with the highest number of votes is the winner. It turns out that the consensus among many high-variance...