Dimensionality reduction
Dimensionality reduction, as the name suggests, reduces the dimensionality of your dataset. That is, these techniques try to compress the dataset such that only the most useful information is retained, and the rest is discarded.
By dimensionality of a dataset, we mean the number of features of this dataset. When the dimensionality is high, that is, there are too many features, it can be bad due to the following reasons:
- If there are more features than the items of the dataset, the problem becomes ill-defined and some linear models, such as ordinary least squares (OLS) regression cannot handle this case
- Some features may be correlated and cause problems with training and interpreting the models
- Some of the features can turn out to be noisy or irrelevant and confuse the model
- Distances start to make less sense in high dimensions -- this problem is commonly referred to as the curse of dimensionality
- Processing a large set of features may be computationally expensive
In the...