In the previous section, our recipe provided a snapshot count of events. That is, it provided the count of events at the point in time. But what if you want to understand a sum of events for some time window? This is the concept of global aggregations:
If we wanted global aggregations, the same example as before (Time 1: 5 blue, 3 green, Time 2: 1 gohawks, Time 4: 2 greens) would be calculated as:
- Time 1: 5 blue, 3 green
- Time 2: 5 blue, 3 green, 1 gohawks
- Time 4: 5 blue, 5 green, 1 gohawks
Within the traditional batch calculations, this would be similar to a groupbykey or GROUP BY statement. But in the case of streaming applications, this calculation needs to be done within milliseconds, which is typically too short of a time window to perform a GROUP BY calculation. However, with Spark Streaming global aggregations, this calculation can be...