DeBERTa
Another new approach to transformers can be found through disentanglement. Disentanglement in AI allows you to separate the representation features to make the training process more flexible. Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen designed DeBERTa, a disentangled version of a transformer, and described the model in an interesting article: DeBERTa: Decoding-enhanced BERT with Disentangled Attention: https://arxiv.org/abs/2006.03654
The two main ideas implemented in DeBERTa are:
- Disentangle the content and position in the transformer model to train the two vectors separately
- Use an absolute position in the decoder to predict masked tokens in the pretraining process
The authors provide the code on GitHub: https://github.com/microsoft/DeBERTa
DeBERTa exceeds the human baseline on the SuperGLUE leaderboard:
Figure 15.5: DeBERTa on the SuperGLUE leaderboard
Remove any space before Let’s run an example on Hugging...