Versioning data
There may be times when we want to persist data without overwriting a prior version of the data file. This can be accomplished by appending a time stamp to a filename or a unique identifier. However, there are more elegant solutions available. One such solution is the Delta Lake library, which we will explore in this recipe.
We will work with the land temperature data again in this recipe. We will load the data, save it to a data lake, and then save an altered version to the same data lake.
Getting ready
We will be using the Delta Lake library in this recipe, which can be installed with pip install deltalake
. We will also need the os
library so that we can make a directory for the data lake.
How to do it...
You can get started with the data and version it as follows:
- We start by importing the Delta Lake library. We also create a folder called
temps_lake
for our data versions:import pandas as pd from deltalake.writer import write_deltalake...