New frontiers – pretrained transformer models
Word2vec and GloVe embeddings capture more semantic information than the bag-of-words approach. However, they allow only a single fixed-length representation of each token that does not differentiate between context-specific usages. To address unsolved problems such as multiple meanings for the same word, called polysemy, several new models have emerged that build on the attention mechanism designed to learn more contextualized word embeddings (Vaswani et al., 2017). The key characteristics of these models are as follows:
- The use of bidirectional language models that process text both left-to-right and right-to-left for a richer context representation
- The use of semi-supervised pretraining on a large generic corpus to learn universal language aspects in the form of embeddings and network weights that can be used and fine-tuned for specific tasks (a form of transfer learning that we will discuss in more detail in...