For further information, refer to the following links:
- Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, by John Duchi et al., http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
- Adadelta: An Adaptive Learning Rate Method, by Matthew D. Zeiler, https://arxiv.org/pdf/1212.5701.pdf
- Adam: A Method For Stochastic Optimization, by Diederik P. Kingma and Jimmy Lei Ba, https://arxiv.org/pdf/1412.6980.pdf
- On the Convergence of Adam and Beyond, by Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar, https://openreview.net/pdf?id=ryQu7f-RZ
- Incorporating Nesterov Momentum into Adam, by Timothy Dozat, http://cs229.stanford.edu/proj2015/054_report.pdf