Ensembling techniques
Ensemble learning, or ensembling, is the process of combining multiple predictive models to produce a supermodel that is more accurate than any individual model on its own.
- Regression: We will take the average of the predictions for each model
- Classification: Take a vote and use the most common prediction, or take the average of the predicted probabilities
Imagine that we are working on a binary classification problem (predicting either 0 or 1).
# ENSEMBLING import numpy as np # set a seed for reproducibility np.random.seed(12345) # generate 1000 random numbers (between 0 and 1) for each model, representing 1000 observations mod1 = np.random.rand(1000) mod2 = np.random.rand(1000) mod3 = np.random.rand(1000) mod4 = np.random.rand(1000) mod5 = np.random.rand(1000)
Now, we simulate five different learning models that each have about a 70% accuracy, as follows:
# each model independently predicts 1 (the "correct response") if random number was at least 0.3 preds1...