In the preceding two chapters, we learned about convolutional neural networks and recurrent neural networks, both of which have been very effective for sequential tasks such as machine translation, image captioning, object recognition, and so on. But we have also seen that they have limitations. RNNs have problems with long-term dependencies. In this chapter, we will cover attention mechanisms, which have been increasing in popularity and have shown incredible results in language- and vision-related tasks.
The following topics will be covered in this chapter:
- Overview of attention
- Understanding neural Turing machines
- Exploring the types of attention
- Transformers
Let's get started!