Optimizing Neural Networks
In this chapter, we're going to discuss the most important optimization algorithms that have been derived from the basic Stochastic Gradient Descent (SGD) approach. This method can be quite ineffective when working with very high-dimensional functions, forcing the models to remain stuck in sub-optimal solutions. The optimizers discussed in this chapter have the goals of speeding up convergence and avoiding any sub-optimality. Moreover, we'll also discuss how to apply L1 and L2 regularization to a layer of a deep neural network, and how to avoid overfitting using these advanced approaches.
In particular, the topics covered in the chapter are as follows:
- Optimized SGD algorithms (Momentum, RMSProp, Adam, AdaGrad, and AdaDelta)
- Regularization techniques and dropout
- Batch normalization
After having discussed the basic concepts of neural modeling in the previous chapter, we can now start discussing how to improve the...