Processing streaming and batch data using Structured Streaming
We tend to see scenarios where we need to process batch data in comma-separated values (CSV) or Parquet format stored in ADLS Gen2 and from real-time streaming sources such as Event Hubs together. In this recipe, we will learn how we use Structured Streaming for both batch and real-time streaming sources and process the data together. We will also fetch the data from Azure SQL Database for all metadata information required for our processing.
Getting ready
Before starting, you need to have a valid subscription with contributor access, a Databricks workspace (Premium), and an ADLS Gen2 storage account. Also, ensure that you have been through the previous recipes of this chapter.
We have executed the notebook on Databricks Runtime Version 7.5 having Spark 3.0.1.
You can follow along by running the steps in the following notebook:
https://github.com/PacktPublishing/Azure-Databricks-Cookbook/blob/main/Chapter07...