You're reading from Business Intelligence with Databricks SQL Concepts, tools, and techniques for scaling business intelligence on the data lakehouse

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781803235332

Length 348 pages

Edition 1st Edition

Languages

SQL

Concepts

Business Intelligence

Author (1):

Vihag Gupta

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Databricks SQL on the Lakehouse

2. Chapter 1: Introduction to Databricks FREE CHAPTER

3. Chapter 2: The Databricks Product Suite – A Visual Tour

4. Chapter 3: The Data Catalog

5. Chapter 4: The Security Model

6. Chapter 5: The Workbench

7. Chapter 6: The SQL Warehouses

8. Chapter 7: Using Business Intelligence Tools with Databricks SQL

9. Part 2: Internals of Databricks SQL

10. Chapter 8: The Delta Lake

11. Chapter 9: The Photon Engine

12. Chapter 10: Warehouse on the Lakehouse

13. Part 3: Databricks SQL Commands

14. Chapter 11: SQL Commands – Part 1

15. Chapter 12: SQL Commands – Part 2

16. Part 4: TPC-DS, Experiments, and Frequently Asked Questions

17. Chapter 13: Playing with the TPC-DS Dataset

18. Chapter 14: Ask Me Anything

19. Index

Why subscribe?

20. Other Books You May Enjoy

What this book covers

Chapter 1, Introduction to Databricks, introduces Databricks along three dimensions. First, it will introduce Databricks, the company. Second, it will introduce the Data Lakehouse architecture – the core data Platform design pattern enabled by Databricks. Third, it will introduce the Databricks Lakehouse Platform. Essentially, this is the platform that Databricks provides for your organization to implement the data lakehouse architecture.

Chapter 2, The Databricks Product Suite – A Visual Tour, presents a visual tour of Databricks SQL and the rest of the Databricks platform. It will teach you how to navigate the platform and locate features of interest with ease.

Chapter 3, The Data Catalog, introduces the data catalog of the Databricks Lakehouse platform. It will teach you how the data objects – catalogs, schemas, tables, and views – are represented in the data catalog. Finally, it will teach you how to navigate and explore the data catalog with UI interfaces and SQL commands. Generated and populated by data engineers and consumed by data analysts, the data catalog is the central pillar of all your data operations.

Chapter 4, The Security Model, discusses the Databricks data security model and teaches how to use it to secure the data. Databricks provides a very fine-grained, yet easily programmable data security model to secure all data and data-related assets.

Chapter 5, The Workbench, introduces the Databricks workbench. The workbench is a set of capabilities that enable a simple, intuitive, and intelligent experience in query building and dashboarding. The Databricks SQL workbench provides users on the unified lakehouse platform an instant way to query the data and extract insights from it.

Chapter 6, The SQL Warehouses, introduces the compute power behind Databricks SQL. SQL Warehouses provide the elastic, scalable compute power that can execute Business Intelligence (BI) queries with ease, no matter the scale of the data. The cloud philosophy says storage and compute power should scale independently so that we can drive the maximum Return on Investment (ROI). This is exactly what the SQL Warehouses in Databricks SQL do.

Chapter 7, Using Business Intelligence Tools with Databricks SQL, teaches you how to connect your business intelligence tool of choice to Databricks SQL. This allows you to harness the power of Databricks SQL from the comfort of your favorite business intelligence tool.

Chapter 8, The Delta Lake, deep dives into the default storage format of Databricks – Delta Lake. It adds a layer of transactional intelligence to the otherwise simple data lake. This chapter will discuss the Delta Lake storage format and how it enables superior out-of-the-box query performance.

Chapter 9, The Photon Engine, deep dives into the Photon engine. It is the query engine that powers Databricks SQL. It is written from the ground up in native C++ and uses the Apache Spark API. This chapter deep dives into what makes Photon so fast.

Chapter 10, Warehouse on the Lakehouse, addresses one of the biggest mental leaps that must be taken when adopting the data lakehouse architecture. This chapter discusses how to implement popular warehousing patterns on the lakehouse.

Chapter 11, SQL Commands Part–1, introduces Databricks-specific SQL commands that are used for data definition and data manipulation operations.

Chapter 12, SQL Commands Part–2, introduces Databricks-specific SQL commands that are used for data security and metadata operations.

Chapter 13, Playing with the TPC-DS Dataset, introduces the TPC-DS dataset. It is a popular dataset for benchmarking decision support systems such as data warehouses. The chapter shows how to generate the TPC-DS dataset in Databricks and test the various concepts learned in the past chapters at scale.

Chapter 14, Ask Me Anything, presents and answers the frequently asked questions about Databricks SQL.