Introducing composite transform – CoGroupByKey
In Task 11, we solved the tracker motivation problem by using side inputs. The actual operation that's involved can be described as a join. We want to join two streams – that is, a 5-minute average with a 1-minute average – to compare them and then output a notification. Using side inputs is handy and efficient, provided they fit in memory. If we have enough users, we will likely run into trouble with this approach. What other options do we have to solve our problem? Fortunately, Apache Beam has a composite transform called CoGroupByKey for this purpose. The transform is composite because it wraps around GroupByKey
and PCollectionTuple
, where each element of two or more input PCollections is tagged using TupleTag
and then processed using GroupByKey
to produce a CoGbkResult
– a wrapper object that holds all the values from each of the input PCollections with the same key and same window. This can be seen...