Streaming data pipelines
In this section, we will deal with a different kind of data source and will learn about the differences in processing data in real time versus the batch-oriented methods we used in the previous sections.
Streaming data pipeline concepts and tools
Again, before we start to build a streaming data processing pipeline, there are some important concepts and tools that we need to introduce and understand.
Apache Beam
Apache Beam is an open source, unified programming model for processing and analyzing large-scale data in batch and streaming modes. It was initially developed by Google as a part of its internal data processing tools, and later, it was donated to the ASF. Beam provides a unified way to write data processing pipelines that can be executed on various distributed processing backends such as Apache Flink, Apache Samza, Apache Spark, Google Cloud Dataflow, and others. It supports multiple programming languages, including Java, Python, and Go...