Chapter 4: Structuring Code for Reusability
We have already walked through a great deal of the Apache Beam programming model, but we haven't investigated one of its core primitives – PTransform. We have seen many particular instances of PTransforms, but what if we wanted to implement our own? And should we even do that in the first place? In this chapter, we will explain how exactly Apache Beam builds the Directed Acyclic Graph (DAG) of operations, and we will use this knowledge to build a Domain Specific Language (DSL) to solve a specific use case that uses less boilerplate code than just by using plain Apache Beam. Then, we will introduce some of the built-in DSLs of Apache Beam. Last, but not least, we will learn how to view a stream of data as a time-varying relation, which is a fancy term for a table changing in time, which will help us establish a base to introduce one additional DSL – SQL. That will be the topic of Chapter 5, Using SQL for Pipeline Implementation...