Introducing and using cross-language pipelines
Cross-language pipelines are a natural concept that comes with Beam's portability. Every executed PTransform
in a pipeline has an associated environment
. This environment describes how (DOCKER
, EXTERNAL
, PROCESS
) and what (the Python SDK, Java SDK, Go SDK, and so on) should be executed by the Runner so that the pipeline behaves as intended by the pipeline author. Most of the time, all PTransforms in a single pipeline share the same SDK and the same environment. This doesn't necessarily have to be a rule and – when we view this via the optics of the Runner only, the Runner does not care if it executes a Python transform or a Java transform. The Runner code is already written in an (SDK) language-agnostic way, so it should not make any difference.
The first thing we must understand is how is the portable pipeline is represented. When an SDK builds and starts to execute a pipeline, it first compiles it into a portable...