In this chapter, we will describe some of the most exciting techniques in modern (at the time of writing—late 2017) machine learning, recurrent neural networks. They are, however, not new; they have been around since the 1980s, but they have become popular due to the numerous records in language-related tasks in recent years.
Why do we need a different type of architecture for text? Consider the following example:
"I live in Prague since 2015"
and
"Since 2015 I live in Prague"
If we would like to teach a traditional feed-forward network such as a perceptron or a multi-layer perceptron to identify the date I moved to Prague, then this network would have to learn separate parameters for each input feature, which in particular implies that it would have to learn grammar to answer this simple question...