Implementing the model with TensorFlow
We will now implement the model we just studied. First let’s import a few things:
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow.keras.backend as K
Implementing the ViT model
Next, we are going to download the pretrained ViT model from TensorFlow Hub. We will be using a model submitted by Sayak Paul. The model is available at https://tfhub.dev/sayakpaul/vit_s16_fe/1. You can see other Vision Transformer models available at https://tfhub.dev/sayakpaul/collections/vision_transformer/1.
image_encoder = hub.KerasLayer("https://tfhub.dev/sayakpaul/vit_s16_fe/1", trainable=False)
We then define an input layer to input images and pass that to the image_encoder
to get the final feature vector for that image:
image_input = tf.keras.layers.Input(shape=(224, 224, 3))
image_features = image_encoder(image_input)
You can look at the size of the final image representation by running:
...