Introducing the primitive PTransform object – stateless ParDo
As we have already noted, the ParDo
PTransform is the most basic primitive transform that we can use to do a variety of useful work. The name is an abbreviation of parallel do, and that is what it does. As already noted, there are multiple versions of this PTransform with different requirements and different behaviors. But, in essence, the basics of stateless ParDo
remain valid for the other cases as well.
The essential parts of a ParDo
object are illustrated in the following figure:
The first thing we notice is that the stream is split into chunks called bundles. The size of bundles or other runtime parameters are runner-specific – that is, each runner can choose its preferred way of assigning elements in a stream into bundles. The important thing to remember is that bundles are considered atomic units of work. The processing of a bundle...