Transformers
Transformer models changed the playing field for most machine learning problems that involve sequential data. They have advanced the state of the art by a significant margin compared to the previous leaders, RNN-based models. One of the primary reasons that the Transformer model is so performant is that it has access to the whole sequence of items (e.g. sequence of tokens), as opposed to RNN-based models, which look at one item at a time. The term Transformer has come up several times in our conversations as a method that has outperformed other sequential models such as LSTMs and GRUs. Now, we will learn more about Transformer models.
In this chapter, we will first learn about the Transformer model in detail. Then we will discuss the details of a specific model from the Transformer family known as Bidirectional Encoder Representations from Transformers (BERT). We will see how we can use this model to complete a question-answering task.
Specifically, we will cover...