In order to learn the parameters of our neural network model from textual data, first, we have to convert the text or natural language data into a format that can be ingested by the neural networks. The neural networks generally ingest the text in the form of numeric vectors. The algorithms that convert raw text data into numeric vectors are known as word embedding algorithms.
One of the popular methods of word embedding is the one-hot encoding that we saw in MNIST image classification. Let's say our text dataset is made up of 60,000 dictionary words. Then each word can be represented by a one-hot encoded vector with 60,000 elements where all other elements have the value zero except the one element that represents this word which has the value one.
However, the one-hot encoding method has its drawbacks. Firstly, for vocabularies with a large number...