What this book covers
Chapter 1, Introducing Model Serving, introduces model serving and why model serving is important to the success of data science and machine learning projects.
Chapter 2, Introducing Model Serving Patterns, describes how patterns in model serving can be of great help to easily identify the best serving approach for a particular problem following the best practices. We also introduce you to different types of serving patterns.
Chapter 3, Stateless Model Serving, discusses how stateless model serving can help improve customer experiences, and the advantages of stateless serving in resilient and scalable model serving.
Chapter 4, Continuous Model Evaluation, introduces you to continuous model evaluation after serving and why it is important. We also discuss some techniques to evaluate the model continuously.
Chapter 5, Keyed Prediction, introduces you to keyed prediction patterns and discusses how passing keys can be beneficial during returning inference to the clients. We also discuss some ideas to generate keys.
Chapter 6, Batch Model Serving, discusses batch and offline model serving and how the inference can be updated during batch serving. We also discuss different techniques for updating the model periodically in batch serving.
Chapter 7, Online Learning Model Serving, discusses how can we serve models where real-time inferences are needed and some of the techniques and challenges in online serving.
Chapter 8, Two-Phase Model Serving, discusses serving two models in parallel, where one model is strong and the other model is weak. This chapter also discusses the necessity of two-phase serving and some ideas and challenges related to it.
Chapter 9, Pipeline Pattern Model Serving, introduces how models can be served using pipelines using directed acyclic graphs.
Chapter 10, Ensemble Model Serving Pattern, introduces the idea of combining multiple models in serving. It also shows how we can ensemble models in different ways and how the response given to the client is sent as a combined outcome from multiple models.
Chapter 11, Business Logic Pattern, discusses how different business logics are used along with inference codes to serve models.
Chapter 12, Exploring TensorFlow Serving, gives a high level introduction to using TensorFlow Serving tool to serve a model.
Chapter 13, Using Ray Serve, introduces the Ray Serve tool for serving machine learning models with of how to use the tool for serving model following few patterns we have discussed.
Chapter 14, Using BentoML, introduces the BentoML tool for serving models, with examples of using BentoML in ensemble pattern and business logic pattern.
Chapter 15, Serving ML Models using a Fully Managed AWS Sagemaker Cloud Solution, discusses how we can serve models using fully managed cloud solution. We use Amazon SageMaker to show you at the high-level how you can serve models using the built-in services provided by a fully managed cloud solution.