Training
Training a network means having already designed its topology. For that purpose we recommend the corresponding Auto-Encoder section in Chapter 4, Unsupervised Feature Learning for design guidelines according to the type of input data and expected use cases.
Once we have defined the topology of the neural network, we are just at the starting point. The model now needs to be fitted during the training phase. We will see a few techniques for scaling and accelerating the learning of our training algorithm that are very suitable for production environments with large datasets.
Weights initialization
The final convergence of neural networks can be strongly influenced by the initial weights. Depending on which activation function we have selected, we would like to have a gradient with a steep slope in the first iterations so that the gradient descent algorithm can quickly jump into the optimum area.
For a hidden unit j in the first layer (directly connected to the input layer), the sum of...