You're reading from Data Engineering with Databricks Cookbook Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781837633357

Length 438 pages

Edition 1st Edition

Tools

Apache Spark

Concepts

Data Engineering

Author (1):

Pulkit Chadha

Preface

1. Part 1 – Working with Apache Spark and Delta Lake FREE CHAPTER

2. Chapter 1: Data Ingestion and Data Extraction with Apache Spark

3. Chapter 2: Data Transformation and Data Manipulation with Apache Spark

4. Chapter 3: Data Management with Delta Lake

5. Chapter 4: Ingesting Streaming Data

6. Chapter 5: Processing Streaming Data

7. Chapter 6: Performance Tuning with Apache Spark

8. Chapter 7: Performance Tuning in Delta Lake

9. Part 2 – Data Engineering Capabilities within Databricks

10. Chapter 8: Orchestration and Scheduling Data Pipeline with Databricks Workflows

11. Chapter 9: Building Data Pipelines with Delta Live Tables

12. Chapter 10: Data Governance with Unity Catalog

13. Chapter 11: Implementing DataOps and DevOps on Databricks

14. Index

15. Other Books You May Enjoy

Creating and managing catalogs, schemas, volumes, and tables using Unity Catalog

Unity Catalog introduces a hierarchy of data objects that organize your data assets:

Metastore: A metadata storage that has a three-level structure (catalog.schema.table or catalog.schema.volume) to arrange your data.
Catalog: An object that groups your data assets in the first level of the structure. A catalog can include schemas, tables, and volumes. A catalog can also specify a storage location that is used by default for its schemas, tables, and volumes.
Schema: The second layer of the object hierarchy, used to group related tables and volumes. A schema can also have a managed storage location that serves as the default location for its tables and volumes.
Table: The next layer of the object hierarchy is used to access tabular data stored in cloud object storage. A table can be either managed or external. A managed table is backed by a managed storage location and is automatically...

The rest of the chapter is locked