Explaining PTransform expansion
A PTransform
is a short name for parallel transform – an Apache Beam primitive for transforming PInput
into POutput
. PInput
is a labeling interface that marks objects as suitable as input to PTransform
, while POutput
marks objects as suitable as outputs. We already know these objects quite well – a typical one that's used for both input and output is PCollection
. But there are others as well – most notably PCollectionTuple
and PCollectionList
. There are also two special objects – PBegin
and PDone
. As we already know, an Apache Beam program – a pipeline – is a DAG whose edges represent PCollections and whose nodes represent PTransforms. PTransforms in the DAG that take PBegin
as input are roots, while PTransforms that produce PDone
are the leaves of the DAG.
This can be seen in the following diagram:
A PTransform
is a recursive...