Faster Transformer model serving using TFX
TFX provides a faster and more efficient way to serve deep learning-based models. But it has some important key points you must understand before you use it. The model must be a saved model type from TensorFlow so that it can be used by TFX Docker or the CLI. Let's take a look:
- You can perform TFX model serving by using a saved model format from TensorFlow. For more information about TensorFlow saved models, you can read the official documentation at https://www.tensorflow.org/guide/saved_model. To make a saved model from Transformers, you can simply use the following code:
from transformers import TFBertForSequenceClassification model = \ TFBertForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-imdb", from_pt=True) model.save_pretrained("tfx_model", saved_model=True)
- Before we understand how to use it to serve Transformers, it is required to pull the Docker image for TFX:
$ docker pull...