Before we go
This chapter focused more on applying transformers to a problem than finding a silver bullet transformer model, which does not exist.
You have two main options to solve an NLP problem: find new transformer models or create reliable, durable methods to implement transformer models.
Looking for the silver bullet
Looking for a silver bullet transformer model can be time-consuming or rewarding, depending on how much time and money you want to spend on continually changing models.
For example, a new approach to transformers can be found through disentanglement. Disentanglement in AI allows you to separate the features of a representation to make the training process more flexible. Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen designed DeBERTa, a disentangled version of a transformer, and described the model in an interesting article:
DeBERTa: Decoding-enhanced BERT with Disentangled Attention, https://arxiv.org/abs/2006.03654
The two...