Exploring embedded feature selection methods
Embedded methods exist within models themselves by naturally selecting features during training. You can leverage the intrinsic properties of any model that has them to capture the features selected:
- Tree-based models: For instance, we have used the following code many times to count the number of features used by the RF models, which is evidence of feature selection naturally occurring in the learning process:
sum(reg_mdls[mdlname]['fitted'].feature_importances_ > 0)
XGBoost's RF uses
gain
by default, which is the average decrease in error in all splits where it used the feature to compute feature importance. We can increase the threshold above 0 to select even fewer features according to this relative contribution. However, by constraining the trees' depth, we forced the model to choose even fewer features already. - Regularized models with coefficients: We will study this further in Chapter 12, Monotonic...