Defining the LSTM
Now that we have defined the data generator to output a batch of data, starting with a batch of image feature vectors followed by the caption for the respective images word by word, we will define the LSTM cell. The definition of the LSTM and the training procedure is similar to what we observed in the previous chapter.
We will first define the parameters of the LSTM cell. Two sets of weights and a bias for input gate, forget gate, output gate, and for calculating the candidate value:
# Input gate (i_t) - How much memory to write to cell state # Connects the current input to the input gate ix = tf.Variable(tf.truncated_normal([embedding_size, num_nodes], stddev=0.01)) # Connects the previous hidden state to the input gate im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], stddev=0.01)) # Bias of the input gate ib = tf.Variable(tf.random_uniform([1, num_nodes],0.0, 0.01)) # Forget gate (f_t) - How much memory to discard from cell state # Connects the current input...