Working with text-to-text models
The left encoder and the right decoder parts of the transformer are connected with cross-attention, which helps each decoder layer attends over the final encoder layer, which means it exploits encoded information accumulated at the encoder layer. This naturally pushes models toward producing output that is closely tied to the original input. The following are popular text-to-text models that keep the encoder and decoder part of the transformer:
- T5: Exploring the limits of transfer learning with a unified text-to-text transformer
- BART: Bidirectional and Auto-Regressive Transformer
- PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-Sequence models
Let’s begin to understand and work with T5.
Multi-task learning with T5
Most NLP architectures, ranging from Word2Vec to transformers, learn embeddings and other parameters by predicting the masked words using context (neighbor) words...