In this chapter, we learned about inference engines and how they are an essential tool for the final deployment of a trained Caffe2 model on accelerators. We focused on two types of popular accelerators: NVIDIA GPUs and Intel CPUs. We looked at how to install and use TensorRT for deploying our Caffe2 model on NVIDIA GPUs. We also looked at the installation and use of OpenVINO for deploying our Caffe2 model on Intel CPUs and accelerators.
Many other companies, such as Google, Facebook, Amazon, and start-ups such as Habana and GraphCore, are developing new accelerator hardware for the inference of DL models. There are also efforts such as ONNX Runtime that are bringing together the inference engines from multiple vendors under one umbrella. Please evaluate these options and choose which accelerator hardware and software works best for deployment of your Caffe2 model.
In...