A practical overview of backpropagation
Multi-layer perceptrons learn from training data through a process called backpropagation. In this section, we will cover the basics while more details can be found in Chapter 15, The Math behind Deep Learning. The process can be described as a way of progressively correcting mistakes as soon as they are detected. Let's see how this works.
Remember that each neural network layer has an associated set of weights that determine the output values for a given set of inputs. Additionally, remember that a neural network can have multiple hidden layers.
At the beginning, all the weights have some random assignment. Then, the net is activated for each input in the training set: values are propagated forward from the input stage through the hidden stages to the output stage where a prediction is made. Note that we've kept Figure 38 simple by only representing a few values with green dotted lines but in reality all the values are propagated forward through the network:
Figure 38: Forward step in backpropagation
Since we know the true observed value in the training set, it is possible to calculate the error made in prediction. The key intuition for backtracking is to propagate the error back (see Figure 39), using an appropriate optimizer algorithm such as gradient descent to adjust the neural network weights with the goal of reducing the error (again, for the sake of simplicity, only a few error values are represented here):
Figure 39: Backward step in backpropagation
The process of forward propagation from input to output and the backward propagation of errors is repeated several times until the error gets below a predefined threshold. The whole process is represented in Figure 40:
Figure 40: Forward propagation and backward propagation
The features represent the input, and the labels are used here to drive the learning process. The model is updated in such a way that the loss function is progressively minimized. In a neural network, what really matters is not the output of a single neuron but the collective weights adjusted in each layer. Therefore, the network progressively adjusts its internal weights in such a way that the prediction increases the number of correctly forecasted labels. Of course, using the right set of features and having quality labeled data is fundamental in order to minimize the bias during the learning process.