In this chapter, we focused on seq2seq models and the attention mechanism. First, we discussed and implemented a regular recurrent encoder-decoder seq2seq model and learned how to complement it with the attention mechanism. Then, we talked about and implemented a purely attention-based type of model called a transformer. We also defined multihead attention in their context. Next, we discussed transformer language models (such as BERT, transformerXL, and XLNet). Finally, we implemented a simple text-generation example using the transformers library.
This chapter concludes our series of chapters with a focus on natural language processing. In the next chapter, we'll talk about some new trends in deep learning that aren't fully matured yet but hold great potential for the future.