You're reading from Machine Learning Model Serving Patterns and Best Practices A definitive guide to deploying, monitoring, and providing accessibility to ML models in production

Product type Paperback

Published in Dec 2022

Publisher Packt

ISBN-13 9781803249902

Length 336 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Machine Learning

Author (1):

Md Johirul Islam

View More author details

Table of Contents (22) Chapters

Preface

1. Part 1:Introduction to Model Serving

2. Chapter 1: Introducing Model Serving FREE CHAPTER

3. Chapter 2: Introducing Model Serving Patterns

4. Part 2:Patterns and Best Practices of Model Serving

5. Chapter 3: Stateless Model Serving

6. Chapter 4: Continuous Model Evaluation

7. Chapter 5: Keyed Prediction

8. Chapter 6: Batch Model Serving

9. Chapter 7: Online Learning Model Serving

10. Chapter 8: Two-Phase Model Serving

11. Chapter 9: Pipeline Pattern Model Serving

12. Chapter 10: Ensemble Model Serving Pattern

13. Chapter 11: Business Logic Pattern

14. Part 3:Introduction to Tools for Model Serving

15. Chapter 12: Exploring TensorFlow Serving

16. Chapter 13: Using Ray Serve

17. Chapter 14: Using BentoML

18. Part 4:Exploring Cloud Solutions

19. Chapter 15: Serving ML Models using a Fully Managed AWS Sagemaker Cloud Solution

20. Index

Why subscribe?

21. Other Books You May Enjoy

Understanding the importance of model serving

Model serving is one of the critical steps in the ML life cycle but is often neglected. As shown in Figure 1.7, users can only start using the model after serving is done. So, model serving is the key step to the business success of a data science or ML team.

Figure 1.7 – ML life cycle

A lot of models remain unserved simply because model serving is hard. This happens mainly due to the following reasons:

Separation of responsibilities: Often, model development is assigned to data scientists and serving is assigned to engineers, and there may be a gap between the domain knowledge of the two groups. For example, consider a data scientist who has developed a model using a notebook and is getting some predictions and a software engineer who will be serving the model. The product manager is asking for the model to be provided on a server in a production-ready state but some challenges come up:
- How should they save the model?
- Are the tools the software engineer is using compatible with the model the data scientist has developed?
- Data scientists tend to use a lot of fancy libraries to make models; are all those libraries supported in the platform the software engineering team is using?
- How will they maintain the version of the model? Is the model developed by data scientists easily maintainable?
- What kind of serving mechanism is needed?
- Which endpoint needs to be published for prediction?
- How does the data from users need to be processed on the server?

Challenges like this create barriers to serving a model after development.

Choice of tool: Tool after tool is appearing for serving models. It makes serving more challenging, as a particular tool may be the perfect choice for a particular type of model, and you need to know which tool is best for which model type. For example, for a simple model, simple REST APIs developed using Flask may be sufficient. However, for complex models, developing a Flask API can be less effective as it was not developed to serve the purpose of stateful client-server communication. Switching from one tool to another can be a problem, and identifying which tool is better for which problem might be challenging. Therefore, as the volume of tools increases, the volume of confusion and challenges in model serving keep increasing.
Versioning: In software engineering, the versioning of software is very easy. We need to redeploy the new updates through a continuous integration/continuous deployment (CI/CD) pipeline for a software application. However, in model serving, versioning is challenging because it involves new feature engineering, retraining with new data, and evaluation. Ensuring backward compatibility is not an easy task as it can lead to an error state in ML. Versioning models can be frustrating.
Rollback: In software engineering, let’s say deployment to production is behaving incorrectly after the last change; we can then easily roll back the last change and go back to the previous state very quickly. However, in model serving, we cannot take advantage of this shortcut to roll back to the last model.

The preceding points gave us an idea about the challenges involved in model serving. In the following section, we will introduce you to some existing tools for model serving.