Persisting a trained model with Joblib
In the previous chapter, you learned how to train an estimator with scikit-learn. When building such models, you’ll likely obtain a rather complex Python script to load your training data, pre-process it, and train your model with the best set of parameters. However, when deploying your model in a web application such as FastAPI, you don’t want to repeat this script and run all those operations when the server is starting. Instead, you need a ready-to-use representation of your trained model that you can just load and use.
This is what Joblib does. This library aims to provide tools for efficiently saving Python objects to disk, such as large arrays of data or function results: this operation is generally called dumping. Joblib is already a dependency of scikit-learn, so we don’t even need to install it. Actually, scikit-learn itself uses it internally to load the bundled toy datasets.
As we’ll see, dumping a...