Stacking and blending
We started this chapter by talking about machine learning algorithms, which learn a function from a set of inputs and outputs. While using those machine learning algorithms, we learned about the functions that forecast our time series, which we’ll call base forecasts now.
Why not use the same machine learning paradigm to learn this new function, F, that we are trying to learn as well?
This is exactly what we do in stacking (often called stacked generalization), where we train another learning algorithm on the predictions of some base learners to combine these predictions. This second-level model is often called a stacked model or a meta model. And typically, this meta model performs equal to or better than the base learners. This is very similar to blending where the only difference being the way we split the data.
Although the idea originated with Wolpert in 1992, Leo Breiman formalized this idea in the way it is used now in his 1996 paper...