In the second example, we'll focus on modeling the opposite of the previous example. Instead of discussing what typical fraudless cases are, we'll discuss the normal expected behavior of the system. If something cannot be matched against our expected model, it will be considered anomalous.
Anomaly detection in website traffic
Dataset
We'll work with a publicly available dataset that was released by Yahoo! Labs, which is useful for discussing how to detect anomalies in time series data. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers.
Even though Yahoo has announced that their data is publicly available, you have to apply to use it, and it takes about 24 hours before the approval...