Deploying and operating models
Once you have trained and optimized an ML model, it is ready for deployment. Many data science teams, in practice, stop here and move the model to production as a Docker image, often embedded in a REST API using Flask or similar frameworks. However, as you can imagine, this is not always the best solution depending on your use case requirements. An ML or data engineer's responsibility doesn't stop here.
The deployment and operation of an ML pipeline can be best seen when testing the model on live data in production. A test is done to collect insights and data to continuously improve the model. Hence, collecting model performance over time is an essential step to guaranteeing and improving the performance of the model.
In general, we differentiate two architectures for ML-scoring pipelines, which we will briefly discuss in this section:
- Batch scoring using pipelines
- Real-time scoring using a container-based web service