Real-time inference
As discussed earlier in this chapter, the need for real-time inference arises when we need results with very low latency. Several day-to-day use cases are examples of using real-time inference from machine learning models, such as face detection, fraud detection, defect and anomaly detection, and sentiment analysis in live chats. Real-time inference in Amazon SageMaker can be carried out by deploying our model to the SageMaker hosting services as a real-time endpoint. Figure 7.11 shows a typical SageMaker machine learning workflow of using a real-time endpoint.
Figure 7.11 – Example architecture of a SageMaker real-time endpoint
In this figure, we first read our data from an Amazon S3 bucket. Data preprocessing and feature engineering are carried out on this data using Amazon SageMaker Processing. A machine learning model is then trained on the processed data, followed by results evaluation and post-processing (if any). After...