This chapter helps the readers do the basic setup of various computation components that will be required throughout the book. We will do the setup and some basic set of examples validating these setups. Apache Spark, Apache Flink, and Apache Beam are computation engines we will discuss in this chapter. There are more computational engines available in market.
As per the definitions on official websites of computation engines, Apache Spark is a fast and general engine for large-scale data processing engine, Apache Flink is an open source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications and Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using Apache Beam, you can run the program on your choice of computation...