Inference optimizations and alternative deployment targets
Using Azure Machine Learning deployments, it's quite easy to get your first experimental service up and running. Through the versioning and abstracting of models and environments, it is painless to deploy the same model and environment to different compute targets. However, it's not that easy to know beforehand how many resources your model will consume and how you can optimize your model or deployment for a higher inferencing throughput.
Profiling models for optimal resource configuration
Azure Machine Learning provides a handy tool to help you evaluate the required resources for your ML model deployment through model profiling. This will help you estimate the number of CPUs and the amount of memory required to operate your scoring service at a specific throughput.
Let's take a look at the model profile of the model that we trained during the real-time scoring example:
- First, you need to define...