What is anomaly and outlier detection?
Anomaly detection, often related to outlier detection and novelty detection, is the identification of items, events, or observations that deviate considerably from an expected pattern observed in a homogeneous dataset.
Anomaly detection is about predicting the unknown.
Whenever we find a discordant observation in the data, we could call it an anomaly or outlier. Although the two words are often used interchangeably, they actual refer to two different concepts, as Ravi Parikh describes in one of his blog posts (http://data.heapanalytics.com/garbage-in-garbage-out- https://blog.heapanalytics.com/garbage-in-garbage-out-how-anomalies-can-wreck-your-data/):
"An outlier is a legitimate data point that's far away from the mean or median in a distribution. It may be unusual, like a 9.6-second 100-meter dash, but still within the realm of reality. An anomaly is an illegitimate data point that's generated by a different process than whatever generated the rest...