This chapter has been a whirlwind tour regarding the core concepts of Apache Beam and how to run a basic WordCount pipeline using Apache Apex as a backend. Specifically, we looked at the following topics:
- The technical vision of Beam—any language on any data processing engine
- The main parallel processing patterns of Beam—ParDo and GroupByKey
- The features of the Beam model that support unbounded data—windowing, watermarks, and triggers
- A basic Beam pipeline to count occurrences of words
- Launching a Beam pipeline using Apache Apex on a YARN cluster
For more details on both Beam and the Apex runner for Beam, visit the Beam website at https://beam.apache.org. Also, follow @ApacheBeam on Twitter and join our user mailing list at user@beam.apache.org by following the instructions at https://beam.apache.org/get-started/support/.