Building Spark streaming applications
In this section, we will primarily focus on the introduced structured streaming feature (in Spark 2.0). Structured streaming APIs are GA with Spark 2.2 and using them is the preferred method for building streaming Spark applications. Several updates to Kafka-based processing components including performance improvements have also been released in Spark 2.2. We introduced structured streaming in Chapter 1, Getting Started with Spark SQL. In this chapter, we will get deeper into the topic and present several code examples to showcase its capabilities.
As a quick recap, structured streaming provides a fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the developer having to reason about the underlying streaming mechanisms.
It is built on the Spark SQL engine, and the streaming computations can be expressed in the same way batch computations are expressed on static data. It provides several data abstractions including Streaming...