Defining anomalies
In the ML realm, anomalies represent data that lies outside of the expected range. The anomaly may occur accidentally, or someone may have put it there, but an anomaly is usually unexpected and potentially unwanted. Anomalies come in two forms:
- Outliers: When the data doesn’t fit in with the rest of the data, it’s an outlier. An outlier can come in many forms, but the defining characteristic is that it’s definitely not wanted because it skews any sort of analysis performed with it in place.
- Novelties: Sometimes, the data is outside the normal range, but it actually does fit in with the rest of the data. In this case, the data represents a new example that must be considered as part of any analysis. Otherwise, the analysis will fail to represent the true state of whatever the analysis is supposed to bring to light.
Part of the problem, then, is that both kinds of anomaly lie outside the normal range, but one is wanted and the...