Observing the results
The recommendation provided by Inference Recommender includes instance metrics, performance metrics, and cost metrics.
Instance metrics include InstanceType
, InitialInstanceCount
, and EnvironmentParameters
, which are tuned according to the job for better performance.
Performance metrics include MaxInvocations
and ModelLatency
, whereas cost metrics include CostPerHour
and CostPerInference
.
These metrics enable you to make informed trade-offs between cost and performance. For example, if your business requirement is overall price performance with an emphasis on throughput, then you should focus on CostPerInference
. If your requirement is a balance between latency and throughput, then you should focus on ModelLatency
and MaxInvocations
metrics.
You can view the results of the Inference Recommender job either through an API call or in the SageMaker Studio UI.
The following is the code snippet for observing the results:
… data = [   ...