Summary
In this chapter, we have learned different ways to deploy an MLflow inference pipeline model for both batch inference and online real-time inference. We started with a brief survey on different model serving scenarios (batch, streaming, and on-device) and looked at three different categories of tools for MLflow model deployment (the MLflow built-in deployment tool, MLflow deployment plugins, and generic model inference serving frameworks that could work with the MLflow inference model). Then, we covered several local deployment scenarios, using the PySpark UDF function to do batch inference and MLflow local deployment for web service. Afterward, we learned how to use Ray Serve in conjunction with the mlflow-ray-serve
plugin to deploy an MLflow Python inference pipeline model into a local Ray cluster. This opens doors to deploy to any cloud platform such as AWS, Azure ML, or GCP, as long as we can set up a Ray cluster in the cloud. Finally, we provided a complete end-to-end...