Next steps
Now that we have seen how to deploy and score a Deep Learning model, feel free to explore other challenges that sometimes accompany the consumption of models:
- How do we scale the scoring for massive workloads, for example, serving 1 million predictions every second?
- How do we manage the response time of scoring throughput within a certain round-trip time? For example, the round-trip between a request coming in and the score being served cannot exceed 20 milliseconds. You can also think of ways to optimize such DL models while deploying, such as batch inference and quantization.
- Heroku is a popular option to deploy. You can deploy a simple ONNX model over Heroku under a free tier. You can deploy the model without the frontend or with a simple frontend to just upload a file. You can go a step further and use a production server, such as Uvicorn, Gunicorn, or Waitress, and try to deploy the model.
- It is also possible to save the model as a
.pt
file and...