Modeling data preparation
In this stage of a machine learning life cycle, we need to finalize the features and data points we want to use for modeling, as well as our model evaluation and testing strategies.
Feature selection and extraction
The original features that were normalized and scaled in previous steps can be now processed further to increase the likelihood of having a high-performance model. In general, features can either be sub-selected, meaning some of the features get thrown out, using a feature selection method, or be used to generate new features, which is traditionally called feature extraction.
Feature selection
The goal of feature selection is to reduce the number of features, or the dimensionality of your data, and keep features that are information-rich. For example, if we have 20,000 features and 500 data points, there is a high chance that most of the original 20,000 features are not informative when used to build a supervised learning model. The...