Understanding temporal versus population analysis
We learned back in Chapter 1, Machine Learning for IT, that there are effectively two ways to consider something as anomalous:
- Whether or not something changes drastically with respect to its own behavior over time
- Whether or not something is drastically different when compared to its peers in an otherwise homogeneous population
By default, the former (which we'll simply call temporal analysis) is the mode used unless the over_field_name
setting is specified in the detector config.
Population analysis can be very useful in finding outliers in a variety of important use cases. For example, perhaps we want to find machines that are logging more (or less) than similarly configured machines in the following scenarios:
- Incorrect configuration changes that have caused more errors to suddenly occur in the log file for the system or application.
- A system that might be compromised by malware may actually...