Now let's deploy the complete module as a RESTful service. To do so, we will write an inference code that loads the latest checkpoint and makes the prediction on the given image.
Look into the inference.py file in the repository. All the code is similar to the training loop except we don't use teacher forcing here. The input to the decoder at each time step is its previous predictions, along with the hidden state and the encoder output.
One important part is to load the model in memory for which we are using the tf.train.Checkpoint() method, which loads all of the learned weights for optimizer, encoder, decoder into the memory. Here is the code for that:
checkpoint_dir = './my_model'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(
optimizer=optimizer...