The main idea behind ensembles is to combine multiple estimators so that they make better predictions than a single estimator. However, you should not expect the mere combination of multiple estimators to just lead to better results. The combined predictions of multiple estimators who make the exact same mistakes will be as wrong as each individual estimator in the group. Therefore, it is helpful to think of the possible ways to mitigate the mistakes that individual estimators make. To do so, we have to revisit our old friend the bias and variance dichotomy. We will meet few machine learning teachers better than this pair.
If you recall from Chapter 2, Making Decisions with Trees, when we allowed our decision trees to grow as much as they can, they tended to fit the training data like a glove but failed to generalize to newer data points. We referred to this as overfitting, and we have seen the same behavior with unregularized linear...