Querying streaming data in real time
In this section, we will extend our Chicago crime example and will perform some real-time analysis using Spark SQL on the streaming crime data.
All Spark extensions extend a core architecture component of Spark: RDD. Now whether it is DStreams in Spark Streaming or DataFrame in Spark SQL, they are interoperable with each other. We can easily convert DStreams into DataFrames and vice versa. Let's move ahead and understand the integration architecture of Spark Streaming and Spark SQL. We will also materialize the same and develop an application for querying streaming data in real time. Let's refer to this job as SQL Streaming Crime Analyzer.
The high-level architecture of our job
The high-level architecture of our SQL Streaming Crime Analyzer will essentially consist of the following three components:
Crime producer: This is a producer that will randomly read the crime records from the file and push the data to a socket. This is same crime record file which...