Tracking model provenance
Provenance tracking for digital artifacts has been long studied in the literature. For example, when you're using a piece of patient diagnosis data in the biomedical industry, people usually want to know where it comes from, what kind of processing and cleaning has been done to the data, who owns the data, and other history and lineage information about the data. The rise of ML/DL models for industrial and business scenarios in production makes provenance tracking a required functionality. The different granularities of provenance tracking are critical for operationalizing and managing not just the data science offline experimentation, but also before/during/after the model is deployed in production. So, what needs to be tracked for provenance?
Understanding the open provenance tracking framework
Let's look at a general provenance tracking framework to understand the big picture of why provenance tracking is a major effort. The following diagram...