Chapter 2: Implementing, Testing, and Deploying Basic Pipelines
Now that we are familiar with the basic concept of streaming data processing, in this chapter, we will take a deep dive into how to build something practical with Apache Beam.
The purpose of this chapter is to give you some hands-on experience of solving practical problems from start to finish. The chapter will be divided into subsections, with each following the same structure:
- Defining a practical problem
- Discussing the problem decomposition (and how to solve the problem using Beam's
PTransform
) - Implementing a pipeline to solve the defined problem
- Testing and validating that we have implemented our pipeline correctly
- Deploying the pipeline, both locally and to a running cluster
During this process (mostly at Step 2), we will discuss the various possibilities provided by Beam for addressing the problem, and we will try to highlight any caveats or common issues you might run into...