Introducing the portability layer
In this section, we will walk through the design of the portability layer – the FnAPI – to understand which components are orchestrated together to allow pipelines to be executed from different SDKs on the same Runner.
First, let's see how the whole portability layer works. This concept is illustrated in the following (somewhat simplified) diagram:
As we can see, the architecture consists of two types of components – Apache Beam components and Runner components. In this case, a Runner is a piece of technology that performs the actual execution – it may be Apache Flink, Apache Spark, Google Cloud Dataflow, or any other supported Runner. Each of these Runners typically has a coordinator that needs to receive a job submission and use this submission to create work for worker nodes. By doing this, it can orchestrate its execution. This coordinator...