Noise in data management
Missing data and contradictory annotations are only one type of problem with data. In many cases, large datasets, which are generated by feature extraction algorithms, can contain too much information. Features can be superfluous and not contribute to the end results of the algorithm. Many machine learning models can deal with noise in the features, called attribute noise, but too many features can be costly in terms of training time, storage, and even data collection itself.
Therefore, we should also pay attention to the attribute noise, identify it, and then remove it.