BERT – one of the autoencoding language models
BERT was one of the first autoencoding language models to utilize the encoder Transformer stack with slight modifications for language modeling.
The BERT architecture is a multilayer Transformer encoder based on the original Transformer implementation. The Transformer model itself was originally made for machine translation tasks, but the main improvement made by BERT is the utilization of this part of the architecture to provide better language modeling. This language model, after pretraining, is able to provide a global understanding of the language it is trained on.
In the next subsections, you will learn about autoencoding models such as BERT. You will also learn how to pretrain a model and share it with a community. In the next section, BERT is explained in more detail.
BERT language model pretraining tasks
To have a clear understanding of the MLM used by BERT, let’s define it in more detail. MLM is the task...