The ensemble technique, bagging (which stands for bootstrap aggregating), which we briefly mentioned in the first chapter, can effectively overcome overfitting. To recap, different sets of training samples are randomly drawn with replacement from the original training data; each set is used to train an individual classification model. Results of these separate models are then combined together via majority vote to make the final decision.
Tree bagging, as previously described, reduces the high variance that a decision tree model suffers from and hence in general performs better than a single tree. However, in some cases where one or more features are strong indicators, individual trees are constructed largely based on these features and as a result become highly correlated. Aggregating multiple correlated trees will not make much difference. To force each tree to...