Performing upserts of data
Delta Lake also supports the upsert operation using the MERGE
clause. The idea of the upsert is simply to merge data in the existing table with new data originating from the source. It is very common in a lakehouse to ingest incremental data and merge it with the pre-existing table in Delta Lake. The MERGE
process is repeated each time new data gets ingested and is instrumental in creating the single truth of data in the silver layer of the lakehouse.
Important Note
An UPSERT
operation is a mix of updates and inserts in the same operation. Using the MERGE
clause, you can perform an insert in a table if it does not exist or simply update existing data if it does.
- Before we start the
MERGE
process, let's find out the current number of rows in thestore_orders
table:%sql SELECT count(*) FROM store_orders;
This results in the following output:
- Now, we will read the incrementally...