Training and deploying with XGBoost and MLflow
MLflow is an open source platform for machine learning (https://mlflow.org). It was initiated by Databricks (https://databricks.com), who also brought us Spark. MLflow has lots of features, including the ability to deploy Python-trained models on SageMaker.
This section is not intended to be an MLflow tutorial. You can find documentation and examples at https://www.mlflow.org/docs/latest/index.html.
Installing MLflow
Let's set up a virtual environment for MLflow and install all of the required libraries. At the time of writing, the latest version of MLflow is 1.10, and this is the one we'll use here:
- We first initialize a new virtual environment on our local machine, named
mlflow-example
. Then, we activate it:$ virtualenv mlflow-example $ source mlflow-example/bin/activate
- We install MLflow and the libraries required by our training script:
$ pip install mlflow gunicorn pandas sklearn xgboost boto3
- Finally...