The importance of lineage tracking in ML model development
We’ve touched on the concept of lineage tracking in earlier chapters, and now we will explore it in more detail. When we talk about lineage tracking, we’re referring to tracking all of the steps and artifacts that were used to create a given ML model. This includes items such as the following:
- The source datasets
- All transformations that were performed on those datasets
- All intermediate datasets that were created
- Which algorithm was used to train a model on the resulting data
- Which hyperparameters and values were used during model training
- Which platform and tools were used in the training
- If a hyperparameter tuning job was used, details of that job
- Details of any evaluation steps performed on the resulting model
- If the model is being served for online inference, details of the endpoint at which the model is hosted
The preceding list is not exhaustive. We generally...