You're reading from Azure Databricks Cookbook Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Product type Paperback

Published in Sep 2021

Publisher Packt

ISBN-13 9781789809718

Length 452 pages

Edition 1st Edition

Languages

SQL

Tools

Azure

Concepts

Data Streaming

Authors (2):

Vinod Jaiswal

Phani Raj

View More author details

Table of Contents (12) Chapters

Preface

1. Chapter 1: Creating an Azure Databricks Service

2. Chapter 2: Reading and Writing Data from and to Various Azure Services and File Formats FREE CHAPTER

3. Chapter 3: Understanding Spark Query Execution

4. Chapter 4: Working with Streaming Data

5. Chapter 5: Integrating with Azure Key Vault, App Configuration, and Log Analytics

6. Chapter 6: Exploring Delta Lake in Azure Databricks

7. Chapter 7: Implementing Near-Real-Time Analytics and Building a Modern Data Warehouse

8. Chapter 8: Databricks SQL

9. Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks

10. Chapter 10: Understanding Security and Monitoring in Azure Databricks

11. Other Books You May Enjoy

Understanding window aggregation on streaming data

We often encounter situations where we don't want the streaming data to be processed as is and want to aggregate the data and then perform some more transformation before data is written to the destination. Spark provides us with an option for performing aggregation on data using a Windows function for both non-overlapping and overlapping windows. In this recipe, we will learn how to use aggregations using the window function on streaming data.

Getting ready

We will be using Event Hubs for Kafka as the source for streaming data.

You can use the Python script, https://github.com/PacktPublishing/Azure-Databricks-Cookbook/blob/main/Chapter04/PythonCode/KafkaEventHub_Windows.py, which will push the data to Event Hubs for Kafka as the streaming data producer. Change the topic name in the Python script to kafkaenabledhub1.

You can refer to the Reading data from Kafka-enabled Event Hubs recipe to understand how to get the...

The rest of the chapter is locked

You're reading from Azure Databricks Cookbook Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Table of Contents (12) Chapters

Understanding window aggregation on streaming data

Getting ready

Authors (2)

Other recommended products

Personalised recommendations for you

You're reading from Azure Databricks Cookbook Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Table of Contents (12) Chapters

Understanding window aggregation on streaming data

Getting ready

Unlock this book and the full library FREE for 7 days

Authors (2)

Other recommended products

Personalised recommendations for you