Summary
In this chapter, we saw how the T5 transformer models standardized the input of the encoder and decoder stacks of the original Transformer. The original Transformer architecture has an identical structure for each block (or layer) of the encoder and decoder stacks. However, the original Transformer did not have a standardized input format for NLP tasks.
Raffel et al. (2018) designed a standard input for a wide range of NLP tasks by defining a text-to-text model. They added a prefix to an input sequence, which indicated the type of NLP problem to solve. This leads to a standard text-to-text format. The Text-To-Text Transfer Transformer (T5) was born. We saw that this deceivingly simple evolution made it possible to use the same model and hyperparameters for a wide range of NLP tasks. The invention of T5 takes the standardization process of transformer models a step further.
We then implemented a T5 model that could summarize any text. We tested the model on texts...