Summary
In this chapter, we looked at the general design of Apache Beam's portability layer. We understood how this layer is designed so that both Runners and various SDKs can be developed independently so that once a portable Runner is implemented, it should be capable of running any SDK, even if the SDK did not exist at the time the Runner was implemented.
We then had a deep dive into the Python SDK, which builds heavily on the portability layer. We saw that the core Apache Beam model concepts are mirrored by all SDKs. Not all SDKs have the same set of features at the moment, but the set of supported features should converge over time.
We reimplemented some of our well-known examples from the Java SDK into the Python SDK to learn how to write and submit pipelines to a portable Runner – we used FlinkRunner
for this, and we will continue to do so for the rest of this book. Next, we explored interactive programming using InteractiveRunner
and Python notebooks. We saw...