For developing applications that can understand voice or text input, we use techniques from the natural language processing domain. We have just seen several widely used ways to preprocess texts: tokenization, stop words removal, stemming, lemmatization, POS tagging, and named entity recognition.
Word embedding algorithms, and mainly Word2Vec, draw inspiration from the distributive semantics hypothesis, which states that the meaning of the word is defined by its context. Using an autoencoder-like neural network, we learn fixed-size vectors for each word in a text corpus. Effectively, this neural network captures the context of the word and encodes it in the corresponding vector. Then, using linear algebra operations with those vectors, we can discover different interesting relationships between words. For example, it allows us to find semantically close words (cosine similarity...