Tracking data versioning in Delta Lake
In this section, we'll learn how data is tracked in MLflow. Historically, data management and versioning are usually considered as being different from machine learning and data science. However, the advent of data-centric AI is playing an increasingly important role, particularly in DL. Therefore, it is critical to know what and how data is being used to improve the DL model. In the first data-centric AI competition, which was organized by Andrew Ng in the summer of 2021, the requirements to become a winner were not about changing and tuning a model, but rather improving the dataset of a fixed model (https://https-deeplearning-ai.github.io/data-centric-comp/). Here is a quote from the competition's web page: