In Chapter 1, The Apache Spark Ecosystem, the details about Spark Streaming and DStreams were covered. A new and different implementation of streaming, Structured Streaming, was introduced as an alpha release in Apache Spark 2.0.0. It finally became stable starting from Spark 2.2.0.
Structured Streaming (which has been built on top of the Spark SQL engine) is a fault-tolerant, scalable stream-processing engine. Streaming can be done in the same way batch computation is done, that is, on static data, which we presented in Chapter 1, The Apache Spark Ecosystem. It is the Spark SQL engine that's responsible for incrementally and continuously running the computation and for finally updating the results as data continues to stream. In this scenario, end-to-end, exactly-once, and fault-tolerance guarantees are ensured through Write Ahead Logs ...