Almost all the deep learning algorithms we have seen in the book are good at learning how to map training data to their corresponding labels. We cannot use them directly for tasks where the model needs to learn from a sequence and generate another sequence or an image. Some of the example applications are:
- Language translation
- Image captioning
- Image generation (seq2img)
- Speech recognition
- Question answering
Most of these problems can be seen as some form of sequence-to-sequence mapping, and these can be solved using a family of architectures called encoder–decoder architectures. In this section, we will learn about the intuition behind these architectures. We will not be looking at the implementation of these networks, as they need to be studied in more detail.
At a high level, an encoder–decoder architecture would look like the...