Training NNs
The NN function approximates the function : . The goal of the training is to find parameters, θ, such that will best approximate . First, we’ll see how to do that for a
single-layer network, using an optimization algorithm called GD. Then, we’ll extend it to a deep feedforward network with the help of BP.
Note
We should note that an NN and its training algorithm are two separate things. This means we can adjust the weights of a network in some way other than GD and BP, but this is the most popular and efficient way to do so and is, ostensibly, the only way that is currently used in practice.
GD
For the purposes of this section, we’ll train a simple NN using the mean square error (MSE) cost function. It measures the difference (known as error) between the network output and the training data labels of all training samples:
At first, this might look scary, but fear not! Behind the scenes, it’s very simple and straightforward...