Advanced Hybrid Models
In the previous three chapters, we learned extensively about the various convolutional and recurrent network architectures available, along with their implementations in PyTorch. In this chapter, we will take a look at some other deep learning model architectures that have proven to be successful on various machine learning tasks and are neither purely convolutional nor recurrent in nature. We will continue from where we left off in both Chapter 2, Deep CNN Architectures, and Chapter 4, Deep Recurrent Model Architectures.
First, we will explore transformers, which, as we have learnt toward the end of Chapter 4, Deep Recurrent Model Architectures, have outperformed recurrent architectures on various sequential tasks (including LLMs), and have lately become the de-facto AI model for all kinds of tasks (multimodal models, generative AI, etc.). Then, we will pick up from the EfficientNets discussion at the end of Chapter 2, Deep CNN Architectures, and explore...