Inferencing using SageMaker
In this section, you will learn how to create an endpoint using SageMaker instead of the EKS cluster. First, we will describe framework-independent ways of creating inference endpoints (the Model
class). Then, we will look at creating TF endpoints using TensorFlowModel
and the TF-specific Estimator
class. The next section will focus on endpoint creation for PyTorch models using the PyTorchModel
class and the PyTorch-specific Estimator
class. Furthermore, we will introduce how to build an endpoint from an ONNX model. At this point, we should have a service running model prediction for incoming requests. After that, we will describe how to improve the quality of a service using AWS SageMaker Neo and the EI accelerator. Finally, we will cover autoscaling and describe how to host multiple models on a single endpoint.
As described in the Utilizing SageMaker for ETL section in Chapter 5, Data Preparation in the Cloud, SageMaker provides a built-in notebook...