Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Azure Databricks Cookbook

You're reading from   Azure Databricks Cookbook Accelerate and scale real-time analytics solutions using the Apache Spark-based analytics service

Arrow left icon
Product type Paperback
Published in Sep 2021
Publisher Packt
ISBN-13 9781789809718
Length 452 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Vinod Jaiswal Vinod Jaiswal
Author Profile Icon Vinod Jaiswal
Vinod Jaiswal
Phani Raj Phani Raj
Author Profile Icon Phani Raj
Phani Raj
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Chapter 1: Creating an Azure Databricks Service 2. Chapter 2: Reading and Writing Data from and to Various Azure Services and File Formats FREE CHAPTER 3. Chapter 3: Understanding Spark Query Execution 4. Chapter 4: Working with Streaming Data 5. Chapter 5: Integrating with Azure Key Vault, App Configuration, and Log Analytics 6. Chapter 6: Exploring Delta Lake in Azure Databricks 7. Chapter 7: Implementing Near-Real-Time Analytics and Building a Modern Data Warehouse 8. Chapter 8: Databricks SQL 9. Chapter 9: DevOps Integrations and Implementing CI/CD for Azure Databricks 10. Chapter 10: Understanding Security and Monitoring in Azure Databricks 11. Other Books You May Enjoy

What this book covers

Chapter 1, Creating an Azure Databricks Service, explains how to create an Azure Databricks service from the Azure portal, Azure CLI, and ARM templates. You will get to understand the different types of clusters in Azure Databricks, while also learning how to add users and groups and how to authenticate to Azure Databricks using a personal access token.

Chapter 2, Reading and Writing Data from and to Various Azure Services and File Formats, explains how to read and write data from and to various data sources, including Azure SQL DB, Azure Synapse Analytics, ADLS Gen2, Storage Blob, Azure Cosmos, CSV, JSON, and Parquet formats. You will be using native connectors for Azure SQL and an Azure Synapse dedicated pool to read and write the data for improved performance.

Chapter 3, Understanding Spark Query Execution, dives deep into query execution and explains how to check the Spark execution plan and learn more about input, shuffle, and output partitions. You will get to understand the different types of joins while working with data frames. Also, you will learn about a few commonly used session-level configurations for changing partitions.

Chapter 4, Working with Streaming Data, explains how to ingest streaming data from HDInsight Kafka clusters, Event Hub, and Event Hub for Kafka, and how to perform certain transformations and write the output to Spark tables and Delta tables for downstream consumers. You will get to know the various options to use while ingesting data from streaming sources.

Chapter 5, Integrating with Azure Key Vault, App Configuration, and Log Analytics, integrates Azure Databricks with Azure resources such as Azure Key Vault and App Configuration to store credentials and secrets that will be read from Azure Databricks notebooks and explains how to integrate Azure Databricks with Log Analytics for telemetry.

Chapter 6, Exploring Delta Lake in Azure Databricks, explains how to use Delta for batch and streaming data in Azure Databricks. You will also understand how Delta Engine will assist in making your queries run faster.

Chapter 7, Implementing Near-Real-Time Analytics and Building a Modern Data Warehouse, explains how to implement an end-to-end big data solution where you read data from streaming sources such as Event Hub for Kafka, as well as from batch sources such as ADLS Gen-2, and perform various data transformations on data and later store the data in destinations such as Azure Cosmos DB, Azure Synapse Analytics, and Delta Lake. You will build a modern data warehouse and orchestrate the end-to-end pipeline using Azure Data Factory. You will be performing near real-time analytics using notebook visualizations and Power BI.

Chapter 8, Databricks SQL, explains how to run ad hoc SQL queries on the data in your data lake. You will get to know how to create SQL endpoints with multi clusters, write queries, create various visualizations in Data Lake, and create dashboards.

Chapter 9, DevOps Integrations and Implementing CI/CD for Azure Databricks, details how to integrate Azure DevOps Repo and GitHub with Databricks notebooks. You will learn how to implement Azure DevOps CI/CD for deploying notebooks across various environments (UAT and production) as well as how to deploy Azure Databricks resources using ARM templates and automate deployment using the Azure DevOps release pipeline.

Chapter 10, Understanding Security and Monitoring in Azure Databricks, covers pass-through authentication in Azure Databricks and how to restrict access to ADLS Gen-2 using RBAC and ACLs so that users can read data to which they have access from Azure Databricks. You will learn how to deploy Azure Databricks in a VNet as well as how to securely access the data in an ADLS Gen-2 storage account.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime