In this section, we will primarily focus on the newly introduced structured streaming feature (in Spark 2.0). Structured streaming APIs are GA with Spark 2.2 and using them is the preferred method for building streaming Spark applications. Several updates to Kafka-based processing components including performance improvements have also been released in Spark 2.2. We introduced structured streaming in Chapter 1, Getting Started with Spark SQL. In this chapter, we will get deeper into the topic and present several code examples to showcase its capabilities.
As a quick recap, structured streaming provides a fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the developer having to reason about the underlying streaming mechanisms.
It is built on the Spark SQL engine, and the streaming computations can be expressed...