Real-time inference versus batch inference
SageMaker provides two ways to obtain inferences:
- Real-time inference lets you get a single inference per request, or a small number of inferences, with very low latency from a live inference endpoint.
- Batch inference lets you get a large number of inferences from a batch processing job.
Batch inference is more efficient and more cost-effective. Use it whenever your inference requirements allow. We'll explore batch inference first, and then pivot to real-time inference.
Batch inference
In many cases, we can make inferences in advance and store them for later use. For example, if you want to generate product recommendations for users on an e-commerce site, those recommendations may be based on the users' prior purchases and which products you want to promote the next day. You can generate the recommendations nightly and store them for your e-commerce site to call up when the users browse the site.
There...