Introducing the primitive PTransform object – Combine
So far, we have seen three grouping (stateful) transformations: Count
, Top
, and Max
. None of these are actually primitive transformations. A primitive transformation is defined as a transformation that needs direct support from a runner and cannot be executed via other transformations. The Combine
object is actually the first primitive PTransform
object that we are going to introduce. Beam actually has only five primitive PTransform
objects, and we will walk through all of them in this chapter. We call non-primitive PTransform
objects composite transformations.
The Combine
PTransform
object generally performs a reduction operation on a PCollection
object. As the name suggests, the transform combines multiple input elements into a single output value per window (Combine.globally
) or per key and window (Combine.perKey
). This reduction is illustrated by the following figure: