Summary
Our focus in this chapter was the attention mechanism and transformers. We started with the seq2seq model, and we discussed Bahdanau and Luong attention in its context. Next, we gradually introduced the TA mechanism, before discussing the full encoder-decoder transformer architecture. Finally, we focused on encoder-only and decoder-only transformer variants.
In the next chapter, we’ll focus on LLMs, and we’ll explore the Hugging Face transformers library.