Deploying ML models for real-time inference
Real-time inferences include generating predictions on a small number of records using a model deployed as a REST endpoint. The expectation is to receive the predictions in a few milliseconds.
Real-time deployments are needed in use cases when the features are only available when serving the model and cannot be pre-computed. These deployments are more complex to manage than batch or streaming deployments.
Databricks offers integrated model serving endpoints, enabling you to prototype, develop, and deploy real-time inference models on production-grade, fully managed infrastructure within the Databricks environment. At the time of writing this book, there are two additional methods you can utilize to deploy your models for real-time inference:
- Managed solutions provided by the following cloud providers:
- Azure ML
- AWS SageMaker
- GCP VertexAI
- Custom solutions that use Docker and Kubernetes or a similar set of technologies