Feature selection
The process of feature selection involves ranking variables or features according to their importance by training a predictive model using them and then trying to find out which variables were the most relevant features for that model. While each model often has its own set of important features, for classification we will use a random forest model here to try and figure out which variables might be of importance in general for classification-based predictions.
We perform feature selection for several reasons, which include:
Removing redundant or irrelevant features without too much information loss
Preventing overfitting of models by using too many features
Reducing variance of the model which is contributed from excess features
Reducing training time and converging time of models
Building simple and easy to interpret models
We will be using a recursive feature elimination algorithm for feature selection and an evaluation algorithm using a predictive model where we repeatedly...