Translation language modeling and cross-lingual knowledge sharing
So far, you have learned about masked language modeling (MLM) as a cloze task. However, language modeling using neural networks is divided into three categories based on the approach itself and its practical usage, as follows:
- MLM
- Causal language modeling (CLM)
- Translation language modeling (TLM)
It is also important to note that there are other pretraining approaches, such as next-sentence prediction (NSP) and sentence order prediction (SOP), but we only consider token-based language modeling. These three are the main approaches that are used in the literature. MLM, which is described and detailed in previous chapters, is a very close concept to a cloze task in language learning.
CLM is defined by predicting the next token, which is followed by some previous tokens. For example, if you see the following context, you can easily predict the next token:
<s> Transformers changed the natural...