How word embeddings encode semantics
The bag-of-words model represents documents as sparse, high-dimensional vectors that reflect the tokens they contain. Word embeddings represent tokens as dense, lower-dimensional vectors so that the relative location of words reflects how they are used in context. They embody the distributional hypothesis from linguistics that claims words are best defined by the company they keep.
Word vectors are capable of capturing numerous semantic aspects; not only are synonyms assigned nearby embeddings, but words can have multiple degrees of similarity. For example, the word "driver" could be similar to "motorist" or to "factor." Furthermore, embeddings encode relationships among pairs of words like analogies (Tokyo is to Japan what Paris is to France, or went is to go what saw is to see), as we will illustrate later in this section.
Embeddings result from training a neural network to predict words from their context...