In Chapter 6, Language Modeling, we introduced several different language models (word2vec, GloVe, and fastText) that use the context of a word (its surrounding words) to create word vectors (embeddings). These models share some common properties:
- They are context-free (I know it contradicts the previous statement) because they create a single global word vector of each word based on all its occurrences in the training text. For example, lead can have completely different meanings in the phrases lead the way and lead atom, yet the model will try to embed both meanings in the same word vector.
- They are position-free because they don't take into account the order of the contextual words when training for the embedding vectors.
In contrast, it's possible to create transformer-based language models, which are both context- and position-dependent...