Stacking models together
Stacking was first mentioned in David Wolpert’s paper (Wolpert, D. H. Stacked generalization. Neural networks 5.2 – 1992), but it took years before the idea become widely accepted and common (only with release 0.22 in December 2019, for instance, has Scikit-learn implemented a stacking wrapper). This was due principally to the Netflix competition first, and to Kaggle competitions afterward.
In stacking, you always have a meta-learner. This time, however, it is not trained on a holdout, but on the entire training set, thanks to the out-of-fold (OOF) prediction strategy. We already discussed this strategy in Chapter 6, Designing Good Validation. In OOF prediction, you start from a replicable k-fold cross-validation split. Replicable means that, by recording the cases in each training and testing sets at each round or by reproducibility assured by a random seed, you can replicate the same validation scheme for each model you need to be part...