Understanding the data organization model in Databricks SQL
In this section, we will learn about how data assets are organized in Databricks SQL. We call this the data organization model.
The open data lake, which is the foundation of the Databricks Lakehouse platform, relies on cloud object storage for storing data. This data is stored in human-readable formats such as CSV, TSV, and JSON, or big data-optimized formats such as Apache Parquet, Apache ORC, or Delta Lake.
A Note on Data Engineering
The data in the data lake is ingested by data engineering processes. Data engineers create data pipelines that bring data from source systems, clean them, transform them, and write them to the designated destinations in the data lake. These destinations are directories in the data lake. The data within the directory can be further arranged in some fashion – for example, by date.
These file formats are structured and have a defined schema. Having a schema...