Understanding window aggregation on streaming data
We often encounter situations where we don't want the streaming data to be processed as is and want to aggregate the data and then perform some more transformation before data is written to the destination. Spark provides us with an option for performing aggregation on data using a Windows function for both non-overlapping and overlapping windows. In this recipe, we will learn how to use aggregations using the window function on streaming data.
Getting ready
We will be using Event Hubs for Kafka as the source for streaming data.
You can use the Python script, https://github.com/PacktPublishing/Azure-Databricks-Cookbook/blob/main/Chapter04/PythonCode/KafkaEventHub_Windows.py, which will push the data to Event Hubs for Kafka as the streaming data producer. Change the topic name in the Python script to kafkaenabledhub1
.
You can refer to the Reading data from Kafka-enabled Event Hubs recipe to understand how to get the...