Attention Models
Attention models were first introduced in late 2015 by Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio in their influential and seminal paper (https://arxiv.org/abs/1409.0473) that demonstrated the state-of-the-art results of English-to-French translation. Since then, this idea has been used for many sequence-processing tasks with great success, and attention models are becoming increasingly popular. While a detailed explanation and mathematical treatment is beyond the scope of this book, let's understand the intuition behind the idea that is considered by many big names in the field of deep learning as a significant development in our approach to sequence modeling.
The intuition behind attention can be best understood using an example from the task it was developed for – translation. When a novice human translates a long sentence between languages, they don't translate the entire sentence in one go. They break the original sentence down into...