Summary
In this chapter, we investigated various ways to effectively structure our code to enable better reusability. We learned how to write our own PTransform
and how the PTransform
expansion works. We saw different types of objects serving as PInput
– input objects to PTransform
– or POutput
– the output objects of PTransform
. We looked at the most common examples of these objects – PCollection
, PCollectionList
, and PCollectionTuple
. We also looked at two special cases – PBegin
and PDone
– which serve as the root and leaf nodes in the computational DAG, respectively. We also learned about the CoGroupByKey
composite transform, which can be used to perform windowed joins.
Then, we explored a DSL that offers a wrapper around CoGroupByKey
– the Join library. This library offers all types of windowed joins – inner joins, one-sided outer joins, and full outer joins. We used this library to create an extension of our SportTracker...