BERT and its variants
As an example of an LLM technology based on transformers, we will demonstrate the use of BERT, a widely used state-of-the-art system. BERT is an open source NLP approach developed by Google that is the foundation of today’s state-of-the-art NLP systems. The source code for BERT is available at https://github.com/google-research/bert.
BERT’s key technical innovation is that the training is bidirectional, that is, taking both previous and later words in input into account. A second innovation is that BERT’s pretraining uses a masked language model, where the system masks out a word in the training data and attempts to predict it.
BERT also uses only the encoder part of the encoder-decoder architecture because, unlike machine translation systems, it focuses only on understanding; it doesn’t produce language.
Another advantage of BERT, unlike the systems we’ve discussed earlier in this book, is that the training process...