Chapter 4: Tracking Code and Data Versioning
DL models are not just models – they are intimately tied to the code that trains and tests the model and the data that's used for training and testing. If we don't track the code and data that's used for the model, it is impossible to reproduce the model or improve it. Furthermore, there have been recent industry-wide awakenings and paradigm shifts toward a data-centric AI (https://www.forbes.com/sites/gilpress/2021/06/16/andrew-ng-launches-a-campaign-for-data-centric-ai/?sh=5cbacdc574f5), where the importance of data is being lifted to a first-class artifact in building ML and, especially, DL models. Due to this, in this chapter, we will learn how to track code and data versioning using MLflow. We will learn about the different ways we can track code and pipeline versioning and how to use Delta Lake for data versioning. By the end of this chapter, you will be able to understand and implement tracking techniques for...