Quarantining bad data with Delta Live Tables in Databricks
Quarantining bad data means sending it to a separate location for further inspection and correction. You can also replay the quarantined data once you have fixed the issues. With Delta Live Tables, you can quarantine bad data that does not meet your expectations or requirements. You can define expectations for your data using SQL or Python expressions, and specify how to handle records that fail those expectations. You can choose to fail, drop, alert, or quarantine the bad data.
In this recipe, you will learn how to use Delta Live Tables to quarantine bad data in your data pipelines, and how to backfill the quarantined data after resolving the problems.
How to do it…
- Incremental ingestion with an autoloader: Create a dataset from the
farmers_markets_geographic_data
CSV data by creating a streaming table. We are defining a streaming table to denote that this is an incremental append-only load from the CSV...