Deploying a pre-trained model to a real-time inference endpoint
In this section, we will use the SageMaker Python SDK to deploy a pre-trained model to a real-time inference endpoint. From the name itself, we can tell that a real-time inference endpoint can process input payloads and perform predictions in real time. If you have built an API endpoint before (which can process GET and POST requests, for example), then we can think of an inference endpoint as an API endpoint that accepts an input request and returns a prediction as part of a response. How are predictions made? The inference endpoint simply loads the model into memory and uses it to process the input payload. This will yield an output that is returned as a response. For example, if we have a pre-trained sentiment analysis ML model deployed in a real-time inference endpoint, then it would return a response of either "POSITIVE"
or "
NEGATIVE"
depending on the input string payload provided in the request...