Creating your training project with MLflow
You receive a specification from a data scientist based on the XGBoost model being ready to move from a proof-of-concept to a production phase.
We can review the original Jupyter notebook from which the model was registered initially by the data scientist, which is a starting point to start creating an ML engineering pipeline. After initial prototyping and training in the notebook, they are ready to move to production.
Some companies go directly to productionize the notebooks themselves and this is definitely a possibility, but it becomes impossible for the following reasons:
- It's hard to version notebooks.
- It's hard to unit-test the code.
- It's unreliable for long-running tests.
With these three distinct phases, we ensure reproducibility of the training data-generation process and visibility and clear separation of the different steps of the process.
We will start by organizing our MLflow project...