Introduction
Spark streaming is an evolving journey toward unification and structuring of the APIs in order to address the concerns of batch versus stream. Spark streaming has been available since Spark 1.3 with Discretized Stream (DStream). The new direction is to abstract the underlying framework using an unbounded table model in which the users can query the table using SQL or functional programming and write the output to another output table in multiple modes (complete, delta, and append output). The Spark SQL Catalyst optimizer and Tungsten (off-heap memory manager) are now an intrinsic part of the Spark streaming, which leads to a much efficient execution.
In this chapter, we not only cover the streaming facilities available in Spark's machine library out of the box, but also provide four introductory recipes that we found useful as we journeyed toward our better understanding of Spark 2.0.
The following figure depicts what is covered in this chapter:
Spark 2.0+ builds on the success...