Using Optimum to optimize PyTorch model deployment
A crucial aspect of the machine learning life cycle is model deployment. Hugging Face’s Optimum aims to reduce the complexity involved in deploying AI models across diverse platforms, languages, frameworks, and devices. As the name indicates, Optimum also helps optimize the model before deployment.
In this section, we will take a pre-trained model (trained using PyTorch) from the Hugging Face Hub, and convert that PyTorch model into an Open Neural Network Exchange (ONNX) model to use it for inference with ONNX Runtime, as shown in Figure 19.7.
Note
ONNX Runtime is an open-source, high-performance inference engine developed by Microsoft, designed to efficiently execute models that are compliant with the ONNX format across various hardware platforms, such as Intel CPUs, NVIDIA GPUs, Jetson Nano, Android phones, and so on.
We discussed ONNX in Chapter 13, Operationalizing PyTorch Models into Production...