You're reading from Learn Amazon SageMaker A guide to building, training, and deploying machine learning models for developers and data scientists

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781800208919

Length 490 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Machine Learning

Author (1):

Julien Simon

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Introduction to Amazon SageMaker

2. Chapter 1: Introduction to Amazon SageMaker FREE CHAPTER

3. Chapter 2: Handling Data Preparation Techniques

4. Section 2: Building and Training Models

5. Chapter 3: AutoML with Amazon SageMaker Autopilot

6. Chapter 4: Training Machine Learning Models

7. Chapter 5: Training Computer Vision Models

8. Chapter 6: Training Natural Language Processing Models

9. Chapter 7: Extending Machine Learning Services Using Built-In Frameworks

10. Chapter 8: Using Your Algorithms and Code

11. Section 3: Diving Deeper on Training

12. Chapter 9: Scaling Your Training Jobs

13. Chapter 10: Advanced Training Techniques

14. Section 4: Managing Models in Production

15. Chapter 11: Deploying Machine Learning Models

16. Chapter 12: Automating Machine Learning Workflows

17. Chapter 13: Optimizing Prediction Cost and Performance

18. Other Books You May Enjoy

Leave a review - let other readers know what you think

Deploying a multi-model endpoint

Multi-model endpoints are useful when you're dealing with a large number of models that it wouldn't make sense to deploy to individual endpoints. For example, imagine an SaaS company building a regression model for each one of their 10,000 customers. Surely they wouldn't want to manage (and pay for) 10,000 endpoints!

Understanding multi-model endpoints

A multi-model endpoint can serve CPU-based predictions from an arbitrary number of models stored in S3 (GPUs are not supported at the time of writing). The path of the model artifact to use is passed in each prediction request. Models are loaded and unloaded dynamically, according to usage and to the amount of memory available on the endpoint. Models can also be added to, or removed, from the endpoint by simply copying or deleting artifacts in S3.

In order to serve multiple models, your inference container must implement a specific set of APIs that the endpoint will invoke: LOAD...