Faster Transformer model serving using TFX
TFX provides a faster and more efficient way to serve deep learning-based models but it has some important key points you must understand before you use it. The model must be a saved model type from TensorFlow so that it can be used by TFX Docker or the CLI. Let’s take a look:
- You can perform TFX model serving by using a saved model format from TensorFlow. For more information about TensorFlow saved models, you can read the official documentation at https://www.tensorflow.org/guide/saved_model. To make a saved model from Transformers, you can simply use the following code:
from transformers import TFBertForSequenceClassification model = \ TFBertForSequenceClassification.from_pretrained( "nateraw/bert-base-uncased-imdb", from_pt=True) model.save_pretrained("tfx_model", saved_model=True)
...