In this section, we will classify the dataset using bagging. As we have previously shown, decision trees with maximum depth of five are optimal thus, we will use these trees for our bagging example.
We would like to optimize the ensemble's size. We will generate validation curves for the original train set by testing sizes in the range of [5, 30]. The actual curves are depicted here in the following graph:
Validation curves for the original train set, for various ensemble sizes
We observe that variance is minimized for an ensemble size of 10, thus we will utilize ensembles of size 10.
The following code loads the data and libraries (Section 1), splits the data into train and test sets, and fits and evaluates the ensemble on the original dataset (Section 2) and the reduced-features dataset (Section 3):
# --- SECTION 1 ---
# Libraries and data loading
import numpy as...