ANNs for natural language processing
The previous section showed how ANNs can learn mappings of numerical inputs to numerical outputs. Language, however, is inherently non-numeric: a sentence is a sequence of discrete words from a large vocabulary. Building a neural network-based word predictor poses the following challenges:
- The inputs to the model are discrete words. Since ANNs operate on numeric inputs and outputs, a suitable mapping from words to numbers and vice versa is required.
- The inputs are further sequential. Unlike bigrams, the model should be able to take more than one word into account when predicting the next word.
- The output of the language model needs to be a probability distribution over all possible next words. To form a proper distribution, the outputs need to be normalized to be non-negative and sum up to one.
The following sections will explain these challenges and review how they are addressed in modern language models.