Introducing inference optimization features
In the previous chapters, we have seen how we can leverage MXNet, GluonCV, and GluonNLP to retrieve pre-trained models in certain datasets (such as ImageNet, MS COCO, or IWSLT2015) and use them for our specific tasks and datasets. Furthermore, we used transfer learning and fine-tuning techniques to improve the algorithmic performance of those tasks/datasets.
In this recipe, we will introduce (and revisit) several concepts and features that will optimize our inference loops to improve our runtime performance, and we will analyze the trade-offs involved.
Getting ready
As in previous chapters, in this recipe, we will be using some matrix operations and linear algebra, but it will not be hard at all.
How to do it...
In this recipe, we will be carrying out the following steps:
- Hybridizing our models
- Applying float16 and AMP for inference
- Applying quantization by using INT8
- Profiling our models
Let’...