As you have seen previously, we can structure code in data science projects into a set of pipelines that produce various artifacts: reports, models, and data. Different versions of code produce changing outputs, and data scientists often need to reproduce results or use artifacts from past versions of pipelines.
This distinguishes data science projects from software projects and creates a need for managing data versions along with the code: Data Version Control (DVC). In general, different software versions can be reconstructed by using the source code alone, but for data science projects this is not sufficient. Let's see what problems arise when you try to track datasets using Git.