Feature selection
After initial modeling, we will often have a large number of features to choose from, but we wish to select only a small subset. There are many possible reasons for this:
- Reducing complexity: Many data mining algorithms need significantly more time and resources when the number of features increase. Reducing the number of features is a great way to make an algorithm run faster or with fewer resources.
- Reducing noise: Adding extra features doesn't always lead to better performance. Extra features may confuse the algorithm, finding correlations and patterns in training data that do not have any actual meaning. This is common in both smaller and larger datasets. Choosing only appropriate features is a good way to reduce the chance of random correlations that have no real meaning.
- Creating readable models: While many data mining algorithms will happily compute an answer for models with thousands of features, the results may be difficult to interpret for a human. In these cases...