The T5 model
Raffel et al. (2019) focused on designing a standard input format to obtain text output. The Google T5 team did not want to try new architectures derived from the Original Transformer, such as BERT-like encoder-only layers or GPT-like decoder-only layers. Instead, the team focused on defining NLP tasks in a standard format.
They chose to use the Original Transformer model we defined in Chapter 2, Getting Started with the Architecture of the Transformer Model, as we can see in Figure 13.4:
Figure 13.4: The Original Transformer model used by T5
Raffel et al. (2019) kept most of the Original Transformer architecture and terms. However, they emphasized some key aspects. Also, they made some vocabulary and functional changes. The following list contains some of the main aspects of the T5 model:
- The encoder and decoder remain in the model. The encoder and decoder layers become “blocks,” and the sublayers become “subcomponents...