For supervised datasets, manual inspection works fine for datasets with fewer features. As the feature count goes high, manual inspection becomes impractical. We need to perform feature selection techniques, such as chi-square test, random forest, and so on, to deal with the volume of features. We can also use an autoencoder to narrow down the relevant features. Remember that each feature should have a fair contribution toward the prediction outcomes. So, we need to remove noise features from the raw dataset and keep everything else as is, including any uncertain features. In this recipe, we will walk through the steps to identify anomalies in the data.
Removing anomalies from the data
How to do it...
- Leave out all the noise...