In Chapter 3, Training Networks, we learned how to train deep neural networks using Caffe2. In this chapter, we will focus on inference: deploying a trained model in the field to infer results on new data. For efficient inference, the trained model is typically optimized for the accelerator on which it is deployed. In this chapter, we will focus on two popular accelerators: GPUs and CPUs, and the inference engines TensorRT and OpenVINO, which can be used to deploy Caffe2 models on them.
In this chapter, we will look at the following topics:
- Inference engines
- NVIDIA TensorRT
- Intel OpenVINO