Neural Language Models
Chapter 3, Fundamentals of Natural Language Processing introduced us to statistical language models (LMs), which are the probability distribution for a sequence of words. We know LMs can be used to predict the next word in a sentence, or to compute the probability distribution of the next word.
The sequence of words is x1 , x2 … and the next word is xt+1. wj is a word in the vocabulary. V is the vocabulary and j is a position of a word in that vocabulary. wj is the word located in position j within V.
You use LMs every day. The keyboards on cell phones use this technology to predict the next word of a sentence, and search engines such as Google use it to predict what you want to search in their search for engine.
We talked about the n-gram model and bigrams counting the words in a corpus, but that solution has some limitations, such as long dependencies. Deep NLP and neural LMs will help...