In Chapter 2, Feed-Forward Neural Networks, we briefly demonstrated the use of different optimizers. Of course, we only used a test set of size 1. As for other machine learning algorithms, the most used and well-known optimizer for deep learning is Stochastic Gradient Descent (SGD). Other optimizers are variants of SGD that try to speed up convergence by adding heuristics. Also, some optimizers have fewer hyperparameters to tune. The table shown in the Chapter 2, Feed-Forward Neural Networks, is an overview of the most commonly used optimizers in deep learning.
One could argue that the choice largely depends on the user's ability to tune the optimizer. There is definitely no ideal solution that works best for all problems. However, some optimizers have fewer parameters to tune and have proven to outperform other optimizers with default...