Faster inference using ONNX
Open Neural Network Exchange (ONNX) provides faster inference for models that are trained. The optimum
library, on the other hand, provides easier ONNX exports for even pipelines of Hugging Face-based models. The usage and implementation are straightforward, as you will see:
- The first thing to do is install the
optimum
andonnxruntime
libraries:$ pip install optimum[onnxruntime]
- The next step is to load the pipeline using the
optimum
pipeline:from optimum.pipelines import pipeline pipe = pipeline("text-classification", "cardiffnlp/twitter-xlm-roberta-base-sentiment", accelerator="ort")
- Two types of accelerators exist: ONNX runtime (ORT) and BETTERTRANSFORMER. ORT is used for the ONNX export of the model. For this specific example, we just picked a multilingual sentiment analysis model based on XLM-Roberta. Now that it has been converted, you can easily run the pipeline...