Structured Streaming
Structured is a brand new edition in Apache Spark's streaming processing vertical. It is a stream processing engine built on top of the Spark SQL engine. With the introduction of structured streaming, a unification bond of batch processing and stream processing as it allows us to develop a stream processing is enabled application similar to the batch processing application. At the same time, it is scalable and fault tolerant as well.
As per Apache Spark's documentation,
Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming
.
Instead of using DStream in structured streaming, the dataset API can be used and it is the responsibility of the Spark SQL engine to keep the dataset updated as new streaming data arrives. As the dataset API is used, all the Spark SQL operations are available. Therefore, users can use SQL queries on the stream data using the optimized Spark SQL engine...