What this book covers
Chapter 1, Introduction to Databricks, introduces Databricks along three dimensions. First, it will introduce Databricks, the company. Second, it will introduce the Data Lakehouse architecture – the core data Platform design pattern enabled by Databricks. Third, it will introduce the Databricks Lakehouse Platform. Essentially, this is the platform that Databricks provides for your organization to implement the data lakehouse architecture.
Chapter 2, The Databricks Product Suite – A Visual Tour, presents a visual tour of Databricks SQL and the rest of the Databricks platform. It will teach you how to navigate the platform and locate features of interest with ease.
Chapter 3, The Data Catalog, introduces the data catalog of the Databricks Lakehouse platform. It will teach you how the data objects – catalogs, schemas, tables, and views – are represented in the data catalog. Finally, it will teach you how to navigate and explore the data catalog with UI interfaces and SQL commands. Generated and populated by data engineers and consumed by data analysts, the data catalog is the central pillar of all your data operations.
Chapter 4, The Security Model, discusses the Databricks data security model and teaches how to use it to secure the data. Databricks provides a very fine-grained, yet easily programmable data security model to secure all data and data-related assets.
Chapter 5, The Workbench, introduces the Databricks workbench. The workbench is a set of capabilities that enable a simple, intuitive, and intelligent experience in query building and dashboarding. The Databricks SQL workbench provides users on the unified lakehouse platform an instant way to query the data and extract insights from it.
Chapter 6, The SQL Warehouses, introduces the compute power behind Databricks SQL. SQL Warehouses provide the elastic, scalable compute power that can execute Business Intelligence (BI) queries with ease, no matter the scale of the data. The cloud philosophy says storage and compute power should scale independently so that we can drive the maximum Return on Investment (ROI). This is exactly what the SQL Warehouses in Databricks SQL do.
Chapter 7, Using Business Intelligence Tools with Databricks SQL, teaches you how to connect your business intelligence tool of choice to Databricks SQL. This allows you to harness the power of Databricks SQL from the comfort of your favorite business intelligence tool.
Chapter 8, The Delta Lake, deep dives into the default storage format of Databricks – Delta Lake. It adds a layer of transactional intelligence to the otherwise simple data lake. This chapter will discuss the Delta Lake storage format and how it enables superior out-of-the-box query performance.
Chapter 9, The Photon Engine, deep dives into the Photon engine. It is the query engine that powers Databricks SQL. It is written from the ground up in native C++ and uses the Apache Spark API. This chapter deep dives into what makes Photon so fast.
Chapter 10, Warehouse on the Lakehouse, addresses one of the biggest mental leaps that must be taken when adopting the data lakehouse architecture. This chapter discusses how to implement popular warehousing patterns on the lakehouse.
Chapter 11, SQL Commands Part–1, introduces Databricks-specific SQL commands that are used for data definition and data manipulation operations.
Chapter 12, SQL Commands Part–2, introduces Databricks-specific SQL commands that are used for data security and metadata operations.
Chapter 13, Playing with the TPC-DS Dataset, introduces the TPC-DS dataset. It is a popular dataset for benchmarking decision support systems such as data warehouses. The chapter shows how to generate the TPC-DS dataset in Databricks and test the various concepts learned in the past chapters at scale.
Chapter 14, Ask Me Anything, presents and answers the frequently asked questions about Databricks SQL.