Understanding real-time inferencing and batch scoring
Models can be deployed to support different use cases and different business requirements. When deploying a model in production, how you choose to deploy your model should be based on your user requirements. If you need to have a prediction available in real time to support streaming or interaction with your prediction in other applications, then real-time inferencing will be required. Real-time inferencing requires compute resources to be active and available for your model to provide a response. If your application requires less responsive predictions that are stored in a file or perhaps a database, then batch inferencing would be the correct selection. Batch inferencing allows you to spin up and down compute resources.
Before model deployment, the compute for hosting a real-time web service will need to be selected. For real-time inferencing, Azure Kubernetes Service (AKS), Azure Container Instances (ACI), and Azure Arc-enabled...