Detecting data anomalies
Anomaly (and its novelty counterpart) detection is a never-ending, constant requirement because anomalies happen all the time. However, with all this talk of detecting and removing anomalies, you need to consider something else. If you remove the novelties from the dataset (thinking that they are anomalies), then you may not see an important trend. Consequently, detection and research into possible novelties go hand in hand. Of course, the most important place to start is with the data itself, looking for values that don’t obviously belong. Figure 6.2 provides a list of common techniques to detect outliers (the table is definitely incomplete because there are many others):
Method |
Type |
Description |
Cook’s distance |
Model-specific |
This estimates the variations in regression coefficients... |