Creating a multi-hop medallion architecture data pipeline with Delta Live Tables in Databricks
The multi-hop medallion architecture is a data design pattern that organizes data in a lakehouse into multiple layers: bronze, silver, and gold. The bronze layer contains raw data from various sources, the silver layer contains validated and conformed data, and the gold layer contains curated and enriched data for analytics and AI.
In this recipe, you will learn how to use Delta Live Tables to create a multi-hop medallion architecture data pipeline in Databricks. You will use SQL to define your datasets and pipelines.
How to do it…
- Incremental ingestion with an autoloader: Create a bronze dataset from the
iot-stream
device
JSON data andiot-stream user
CSV data.- Create a streaming table for device data: We are defining a streaming table to denote that this is an incremental append-only load from the JSON file that lands in a folder to the
device_data
Delta Lake table:
CREATE...
- Create a streaming table for device data: We are defining a streaming table to denote that this is an incremental append-only load from the JSON file that lands in a folder to the