Chapter 3: Implementing Pipelines Using Stateful Processing
In the previous chapter, we focused on implementing pipelines that used high-level transformations. Such transforms tend to have low numbers of parameters and/or methods that need to be implemented in order to use them, and this comes at the expense of somewhat limited usability. Let's demonstrate this using the example of the GroupByKey
transform. This is quite simply defined as a transform that wraps elements with the same key into an Iterable
object. This Iterable
object (essentially, nothing more than a bag of elements) is then triggered based on a windowing strategy. Nothing more, nothing less. But what if we need finer control? What if we want to control exactly when we emit the output for a particular input element? In that case, these high-level transformations will not do anymore.
In this chapter, we will first (nearly) complete the picture of the primitive PTransform
objects that Apache Beam has in the model...