Word2vec – Learning Word Embeddings
In this chapter, we will discuss a topic of paramount importance in NLP—Word2vec, a data-driven technique for learning powerful numerical representations (that is, vectors) of words or tokens in a language. Languages are complex. This warrants sound language understanding capabilities in the models we build to solve NLP problems. When transforming words to a numerical representation, a lot of methods aren’t able to sufficiently capture the semantics and contextual information that word carries. For example, the feature representation of the word forest should be very different from oven as these words are rarely used in similar contexts, whereas the representations of forest and jungle should be very similar. Not being able to capture this information leads to underperforming models.
Word2vec tries to overcome this problem by learning word representations by consuming large amounts of text.
Word2vec is called...