An ML model uses some critical features to learn patterns in data. All other features add noise to the model, which may lead to a drop in the model's accuracy and overfit the model to the data as well. So, selecting the right features is essential. Also, working a reduced set of important features reduces the model training time.
The following are some of the ways to select the right features prior creating a model:
- We can identify the correlated variables and remove any one of the highly-correlated values
- Remove the features with low variance
- Measure information gain for the available set of features and choose the top N features accordingly
Also, after creating a baseline model, we can use some of the below methods to select the right features:
- Use linear regression and select variables based on p values
- Use stepwise selection for linear regression...