Packt+ | Advance your knowledge in tech

You're reading from Recurrent Neural Networks with Python Quick Start Guide Sequential learning and language modeling with TensorFlow

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781789132335

Length 122 pages

Edition 1st Edition

Languages

Python

Tools

TensorFlow

Concepts

Neural Networks

Author (1):

Simeon Kostadinov

View More author details

Hopefully, now you have a good understanding of how a recurrent neural network works. Unfortunately, this simple model fails to make good predictions on longer and complex sequences. The reason behind this lies in the so-called vanishing/exploding gradient problem that prevents the network from learning efficiently.

As you already know, the training process updates the weights and biases using the backpropagation algorithm. Let's dive one step further into the mathematical explanations. In order to know how much to adjust the parameters (weights and biases), the network computes the derivative of the loss function (at each time step) with respect to the current value of these parameters. When this operation is done for multiple time steps with the same set of parameters, the value of the derivative can become too large or too small. Since we use it to update the parameters, a large value can result in undefined weights and biases and a small value can result in no significant update, and thus no learning.

Derivative is a way to show the rate of change; that is, the amount by which a function is changing at one given point. In our case, this is the rate of change of the loss function with respect to the given weights and biases.

This issue was first addressed by Bengio et al. in 1994, which led to an introduction of the LSTM network with the aim of solving the vanishing/exploding gradient problem. Later in the book, we will reveal how LSTM does this in an excellent fashion. Another model, which also overcomes this challenge, is the gated recurrent unit. In Chapter 3, Generating Your Own Book Chapter, you will see how this is being done.

For more information on the vanishing/exploding gradient problem, it would be useful to go over Lecture 8 from the course Natural Language Processing with Deep Learning by Stanford University (https://www.youtube.com/watch?v=Keqep_PKrY8) and the paper On the difficulty of training recurrent neural networks (http://proceedings.mlr.press/v28/pascanu13.pdf).