The Delta Lake
Up until this chapter, we were singularly focused on enabling you to use Databricks SQL. Now that we have accomplished that, let’s investigate the technologies that enable Databricks SQL to run your data warehousing workloads on what seems to be a data lake.
In this chapter, we will focus on the primary storage format of the Databricks Lakehouse —Delta Lake. Why should you care? You should care because, unlike other cloud data warehouses, the Databricks Lakehouse stores data in open storage formats such as Delta Lake, Parquet, Optimized Row Columnar (ORC), comma-separated values (CSV), and so on, instead of proprietary formats.
We will begin by understanding the challenges posed by using other storage formats, how they affect the business intelligence (BI) experience on traditional data lakes, and how the Delta Lake format addresses them. Then, we will learn about the performance boosters available with Delta Lake in Databricks.
The primary audience...