Creating a Delta Lake table
With the environment set up, we are ready to understand how Delta Lake works. In our Spark session, we have a Spark DataFrame that stores the data of the store_orders
table that was ingested at the first iteration of the electroniz_batch_ingestion_pipeline
run:
Important Note
A Spark DataFrame is an immutable distributed collection of data. It contains rows and columns like a table in a relational database.
- At this point, you should be comfortable running instructions in notebook cells. New cells can be created using Ctrl + Alt + N. After each command, you need to press Shift + Enter to run the command.
From here onwards, I will simply ask you to run the instructions with the assumption that you know how to create new cells and run commands. Invoke the following instructions to write the
store_orders
delta table:SCRATCH_LAYER_NAMESPACE="scratch" DELTA_TABLE_WRITE_PATH="wasbs://"+SCRATCH_LAYER_NAMESPACE+"@"+STORAGE_ACCOUNT...