Chapter 4: Working with Streaming Data
As data ingestion pipelines evolve and change, we see a lot of streaming sources, such as Azure Event Hubs and Apache Kafka, being used as sources or sinks as part of data pipeline applications. Streaming data such as temperature sensor data and vehicle sensor data has become common these days. We need to build our data pipeline application in such a way that we can process streaming data in real time and at scale. Azure Databricks provides a great set of APIs, including Spark Structured Streaming, to process these events in real time. We can store the streaming data in various sinks, including Databricks File System (DBFS), in various file formats, in various streaming systems, such as Event Hubs and Kafka, and in databases such as Azure SQL databases, Azure Synapse Dedicated SQL pools, or in NoSQL databases such as Azure Cosmos. In this chapter, we will learn how to read data from various streaming sources using Azure Databricks Structured Streaming...