Other Architectures and Developments
The attention mechanism architecture described in the last section is only a way of building attention mechanism. In recent times, several other architectures have been proposed, which constitute a state of the art in the deep learning NLP world. In this section, we will briefly mention some of these architectures.
Transformer
In late 2017, Google came up with an attention mechanism architecture in their seminal paper titled "Attention is all you need." This architecture is considered state-of-the-art in the NLP community. The transformer architecture makes use of a special multi-head attention mechanism to generate attention at various levels. Additionally, it is also employs residual connections to further ensure that the vanishing gradient problem has a minimal impact on learning. The special architecture of transformers also allows a massive speed up of the training phase while providing better quality results.
The most commonly used package...