Training the network
The most common way to train NNs these days is with a backward propagation of errors algorithm, or backpropagation (often backprop for short). As we have seen already, individual neurons remind us of linear or logistic regression a lot, so it should not come as a surprise that backpropagation usually comes together with our old friend the gradient descent algorithm. NN training works in the following way:
- Forward pass—input is presented to the layer and the transformations are applied to it layer by layer until the prediction is outputted on the last layer.
- Loss computation—the prediction is compared to the ground truth, and an error value is calculated for each neuron of the output layer using the loss function J.
- The errors are then propagated backward (backpropagation), such that each neuron has an error associated to it, proportional to its contribution to the output.
- Weights (w) are updated using one step of gradient descent. The gradient of the loss function is calculated...