Ensuring reproducible builds and deployments
DevOps has many different meanings, but it is usually oriented toward enabling rapid and high-quality deployments when source code changes. One way of achieving high-quality operational code is to guarantee reproducible and predictable builds, which is also crucial for creating reproducible ML pipelines. While it seems obvious for application development that the compiled binary will look and behave in a similar manner, with only a few minor configuration changes, the same is not true for the development of ML pipelines.
There are four main problems that ML engineers and data scientists face that make building reproducible deployments very difficult:
- The development process is often performed in notebooks, so it is not always linear.
- There are mismatching library versions and drivers.
- Source data can be changed or modified.
- Non-deterministic optimization techniques can lead to completely different outputs...