Summary
In this chapter, we looked at two distinct hybrid types of neural networks. First, we looked at the transformer model – the attention-only-based models with no recurrent connections that have outperformed all recurrent models on multiple sequential tasks. We ran through an exercise where we built, trained, and evaluated a transformer model on a language modeling task with the WikiText-2 dataset using PyTorch. During this exercise, we explored the transformer architecture in detail, both through explained architectural diagrams as well as relevant PyTorch code.
We concluded the first section by briefly discussing the successors of transformers – models such as BERT, GPT, and so on. We demonstrated how PyTorch helps in getting started with loading pre-trained versions of most of these advanced models in less than five lines of code.
In the second and final section of this chapter, we took up from where we left off in Chapter 3, Deep CNN Architectures, where...