Averaging models into an ensemble
In order to introduce the averaging ensembling technique better, let’s quickly revise all the strategies devised by Leo Breiman for ensembling. His work represented a milestone for ensembling strategies, and what he found out at the time still works fairly well in a wide range of problems.
Breiman explored all these possibilities in order to figure out if there was a way to reduce the variance of error in powerful models that tended to overfit the training data too much, such as decision trees.
Conceptually, he discovered that ensembling effectiveness was based on three elements: how we deal with the sampling of training cases, how we build the models, and, finally, how we combine the different models obtained.
As for the sampling, the approaches tested and found were:
- Pasting, where a number of models are built using subsamples (sampling without replacements) of the examples (the data rows)
- Bagging, where a number...