Anomaly detection job throughput considerations
Elastic ML is awesome and is no doubt very fast and scalable, but there will still be a practical upper bound of events/second processed to any anomaly detection job, depending on a couple of different factors:
- The speed at which data can be delivered to the algorithms (that is, query performance)
- The speed at which the algorithms can chew through the data, given the desired analysis
For the latter, much of the performance is based upon the following:
- The function(s) chosen for the analysis, that is,
count
is faster thanlat_long
- The
bucket_span
value chosen (longer bucket spans are faster than smaller bucket spans because more buckets analyzed per unit of time compound the per-bucket processing overhead, which is writing results and so on)
However, if you have a defined analysis set up and can't change it for other reasons, then there's not that much you can do unless you get creative and...