Ensuring reproducible builds and deployments
DevOps has many different meanings but is usually about enabling rapid and high-quality deployments when the source code changes. One way of achieving high-quality operational code is by guaranteeing reproducible and predictable builds. While it seems obvious that the compiled binary will look and behave similarly for application development with only a few minor configuration changes, the same is not true for the development of ML pipelines.
ML engineers and data scientists face many problems that make building reproducible deployments very difficult:
- The development process is often performed in notebooks and so it is not always linear.
- Refactoring notebook code often breaks older notebooks.
- There are mismatching library versions and drivers.
- Source data can be changed or modified.
- Non-deterministic optimization techniques can lead to completely different outputs.
We discussed interactive notebooks (such...