ML is awesome, and is no doubt very fast and scalable, but there will still be a practical upper bound of events/second processed to any ML job, depending on a couple of different factors:
- The speed at which data can be delivered to the ML algorithms (that is, query performance)
- The speed at which the ML algorithms can chew through the data, given the desired analysis
For the latter, much of the performance is based upon the following:
- The function(s) chosen for the analysis, that is, count is faster than lat_long
- The chosen bucket_span (longer bucket spans are faster than smaller bucket spans because more buckets analyzed per unit of time compound the per-bucket processing overhead that's writing results and so on)
However, if you have a defined analysis setup and can't really change it for other reasons, then there's not really...