Ensembling techniques
Ensemble learning, or ensembling, is the process of combining multiple predictive models to produce a supermodel that is more accurate than any individual model on its own:
- Regression: We will take the average of the predictions for each model
- Classification: We will take a vote and use the most common prediction, or take the average of the predicted probabilities
Imagine that we are working on a binary classification problem (predicting either 0 or 1):
# ENSEMBLING import numpy as np # set a seed for reproducibility np.random.seed(12345) # generate 2000 random numbers (between 0 and 1) for each model, representing 2000 observations mod1 = np.random.rand(2000) mod2 = np.random.rand(2000) mod3 = np.random.rand(2000) mod4 = np.random.rand(2000) mod5 = np.random.rand(2000)
Now, we simulate five different learning models, each with about 70% accuracy, as follows:
# each model independently predicts 1 (the "correct response") if random number was at...