Before we dive into structured streaming, let's start by talking about DStreams. DStreams are built on top of RDDs and represent a stream of data divided into small chunks. The following figure represents these data chunks in micro-batches of milliseconds to seconds. In this example, the lines of DStream is micro-batched into seconds where each square represents a micro-batch of events that occurred within that second window:
- At time interval 1 second, there were five occurrences of the event blue and three occurrences of the event green
- At time interval 2 seconds, there is a single occurrence of gohawks
- At time interval 4 seconds, there are two occurrences of the event green
![](https://static.packt-cdn.com/products/9781788835367/graphics/assets/b4e47b01-ac85-4f22-9b6d-572e2d59883e.png)
Because DStreams are built on top of RDDs, Apache Spark's core data abstraction, this allows Spark Streaming to easily integrate with other Spark components...