Word embeddings
The field of natural language processing (NLP) is advancing pretty quickly these days, as much as modern data science and artificial intelligence.Â
Algorithms such as word2vec (Mikolov and others, 2013) and GloVe (Pennington and others, 2014) have been pioneers in the field, and although strictly neither of them is related to deep learning, the models trained with them are used as input data in many applications of deep learning to NLP.Â
We will briefly describe word2vec and GloVe, which are perhaps the most commonly used algorithms for word embedding, although research in the intersection of neural networks and language goes back to at least Jeff Elman in the 1990s.
word2vec
The word2vec algorithm (or, rather, family of algorithms) takes a text corpus as input and produces the word vectors as output. It first constructs a vocabulary from the training text data and then learns vector representation of words. Then we use those vectors as features for machine learning algorithms...