Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Intelligent Projects Using Python

You're reading from   Intelligent Projects Using Python 9 real-world AI projects leveraging machine learning and deep learning with TensorFlow and Keras

Arrow left icon
Product type Paperback
Published in Jan 2019
Publisher Packt
ISBN-13 9781788996921
Length 342 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Santanu Pattanayak Santanu Pattanayak
Author Profile Icon Santanu Pattanayak
Santanu Pattanayak
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Foundations of Artificial Intelligence Based Systems 2. Transfer Learning FREE CHAPTER 3. Neural Machine Translation 4. Style Transfer in Fashion Industry using GANs 5. Video Captioning Application 6. The Intelligent Recommender System 7. Mobile App for Movie Review Sentiment Analysis 8. Conversational AI Chatbots for Customer Service 9. Autonomous Self-Driving Car Through Reinforcement Learning 10. CAPTCHA from a Deep-Learning Perspective 11. Other Books You May Enjoy

The backpropagation method of training neural networks

In the backpropagation method, neural networks are trained through the gradient descent technique, where the combined weights vector, W, is updated iteratively, as follows:

Here, η is the learning rate, W(t+1) and W(t) are the weight vectors at iterations (t+1) and (t), respectively, and ∇C(W(t)) is the gradient of the cost function or the error function, with respect to the weight vector, W, at iteration (t). The previous algorithm for an individual weight or bias generalized by w ∈ W can be represented as follows:

As you can gather from the previous expressions, the heart of the gradient descent method of learning relies on computing the gradient of the cost function or the error function, with respect to each weight.

From the chain rule of differentiation, we know that if we have y = f(x), z = f(y), then the following is true:

This expression can be generalized to any number of variables. Now, let's take a look at a very simple neural network, as illustrated in the following diagram, in order to understand the backpropagation algorithm:

Figure 1.10: A network illustrating backpropagation

Let the input to the network be a two-dimensional vector, x = [x1 x2]T, and the corresponding output label and prediction be and , respectively. Also, let's assume that all of the activation units in the neural network are sigmoids. Let the generalized weight connecting any unit i in layer (l-1) to unit j in layer l be denoted by , while the bias in any unit i in layer l should be denoted by . Let's derive the gradient for one data point; the total gradient can be computed as the sum of all of the data points used in training (or in a mini-batch). If the output is continuous, then the loss function, C, can be chosen as the square of the error in prediction:

The weights and biases of the network, cumulatively represented by the set W, can be determined by minimizing the cost function with respect to the W vector, which is as follows:

To perform the minimization of the cost function iteratively through gradient descent, we need to compute the gradient of the cost function with respect to each weight, w ∈ W, as follows:

Now that we have everything that we need, let's compute the gradient of the cost function, C, with respect to the weight, . Using the chain rule of differentiation, we get the following:

Now let's look at the following formula:

As you can see in the previous expression, the derivative is nothing but the error in prediction. Generally, the output unit activation function is linear in the case of regression problems, and hence the following expression applies:

So, if we were to compute the gradient of the cost function with respect to the total input at the output unit, it would be . This is still equal to the error in prediction of the output.

The total input at the output unit, as a function of the incoming weights and activations, can be expressed as follows:

This means that, and the derivative of the cost function with respect to the weight, , contributing to the input of the output layer is given via the following:

As you can see, the error is backpropagated in computing the gradient of the cost function, with respect to the weights in the layers preceding the final output layer. This becomes more obvious when we compute the gradient of the cost function with respect to the generalized weight, . Let's take the weight corresponding to j=1 and k=2; that is, . The gradient of the cost function, C, with respect to this weight can be expressed as follows:

Now, , which means that, .

So, once we have figured out the gradient of the cost function with respect to the total input to a neuron as , the gradient of any weight, w, contributing to the total input, s, can be obtained by simply multiplying the activation, z, associated with the weight.

Now, the gradient of the cost function with respect to the total input, , can be derived by chain rule again, as follows:

Since all of the units of the neural network (except for the output unit) are sigmoid activation functions, the following is the case:

Combining (1), (2), and (3), we get the following:

In the preceding derived gradient expressions, you can see that the error in prediction, , is backpropagated by combining it with the relevant activations and weights (as per the chain rule of differentiation) for computing the gradients of the weights at each layer, hence, the name backpropagation in AI nomenclature.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image