Exploring feature selection techniques
In the previous recipe, we saw how to evaluate the importance of features used for training ML models. We can use that knowledge to carry out feature selection, that is, keeping only the most relevant features and discarding the rest.
Feature selection is a crucial part of any machine learning project. First, it allows us to remove features that are either completely irrelevant or are not contributing much to a model’s predictive capabilities. This can benefit us in multiple ways. Probably the most important benefit is that such unimportant features can actually negatively impact the performance of our model as they introduce noise and contribute to overfitting. As we have already established—garbage in, garbage out. Additionally, fewer features can often be translated into a shorter training time and help us avoid the curse of dimensionality.
Second, we should follow Occam’s razor and keep our models simple and explainable...