Training of seq2seq
That's all very interesting, but how is it related to RL? The connection lies in the training process of the seq2seq model, but before we come to the modern RL approaches to the problem, we need to say a couple of words about the standard way of carrying out the training.
Log-likelihood training
Imagine that we need to create a machine translation system from one language (say, French) into another language (English) using the seq2seq model. Let's assume that we have a good, large dataset of sample translations with French-English sentences that we're going to train our model on. How do we do this?
The encoding part is obvious: we just apply our encoder RNN to the first sentence in the training pair, which produces an encoded representation of the sentence. The obvious candidate for this representation will be the hidden state returned from the last RNN application. At encoding stage, we ignore the RNN's outputs, taking into account only...