Embeddings for NLP
Words do not have a natural way of representing their meaning. In images, we already have representations in rich vectors (containing the values of each pixel within the image), so it would clearly be beneficial to have a similarly rich vector representation of words. When parts of language are represented in a high-dimensional vector format, they are known as embeddings. Through analysis of a corpus of words, and by determining which words appear frequently together, we can obtain an n-length vector for each word, which better represents the semantic relationship of each word to all other words. We saw previously that we can easily represent words as one-hot encoded vectors:
On the other hand, embeddings are vectors of length n (in the following example, n = 3) that can take any value:
These embeddings represent the word's vector...