Theory of sequence-to-sequence models
Sequence-to-sequence models are very similar to the conventional neural network structures we have seen so far. The main difference is that for a model's output, we expect another sequence, rather than a binary or multi-class prediction. This is particularly useful in tasks such as translation, where we may wish to convert a whole sentence into another language.
In the following example, we can see that our English-to-Spanish translation maps word to word:
The first word in our input sentence maps nicely to the first word in our output sentence. If this were the case for all languages, we could simply pass each word in our sentence one by one through our trained model to get an output sentence, and there would be no need for any sequence-to-sequence modeling, as shown here: