Introduction
In the last two chapters (on regression and classification), we focused on understanding and implementing the various machine learning algorithms in the supervised learning category on a given dataset pertaining to a problem.
In this chapter, we will focus more on effectively using the features of the dataset to build the best performing model. Often in many datasets, the feature space is quite large (with many features). The model performance takes a hit as the patterns are hard to find and often much noise is present in the data. Feature selections are specific methods that are used to identify the importance of each feature and assign a score to each. We can then select the top 10 or 15 features (or even more) based on the score for building our model.
Another possibility is to create new variables using a linear combination of all the input variables. This helps in keeping the representation of all variables and reducing the dimensionality of feature space. However, such a...