Optimizers and the Learning Rate
In the previous section, we saw that a neural network follows an iterative process to find the best solution for any input dataset. Its learning process is an optimization process. You can use different optimization algorithms (also called optimizers) for a neural network. The most popular ones are Adam
, SGD
, and RMSprop
.
One important parameter for the neural networks optimizer is the learning rate. This value defines how quickly the neural network will update its weights. Defining a too-low learning rate will slow down the learning process and the neural network will take a long time before finding the right parameters. On the other hand, having too-high a learning rate can make the neural network not learn a solution as it is making bigger weight changes than required. A good practice is to start with a not-too-small learning rate (such as 0.01 or 0.001), then stop the neural network training once its score starts to plateau or get worse, and...