Summary
In this chapter, we took a deep dive into how we can track code and data versions in an MLflow experiment run. We started by reviewing the different types of notebooks: Jupyter notebooks, Databricks notebooks, and VS Code notebooks. We compared them and recommended that VS Code should be used to author a notebook due to its IDE support, as well as its Python styling, autocompletion, and many more rich features.
Then, after reviewing the limitations of existing ML pipeline API frameworks, we discussed how to create a multi-step DL pipeline using MLflow's MLproject framework. We showed a step-by-step approach to creating a three-step DL pipeline using MLproject and how to implement a pipeline function to orchestrate the necessary tasks. We also provided a Python implementation template to help you implement each pipeline task. When running a pipeline with MLflow, we can track the entire pipeline's progress with a parent run_id
, and then use a child run_id
for each...