History
The basics of continuous backpropagation were proposed by Henry J. Kelley [1] in 1960 using dynamic programming. Stuart Dreyfus proposed using the chain rule in 1962 [2]. Paul Werbos was the first proposing to use backpropagation for neural nets in his 1974 PhD Thesis [3]. However, it was only in 1986 that backpropagation gained success with the work of David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams published in Nature [4]. Only in 1987, Yan LeCun described the modern version of backpropagation currently used for training neural networks [5].
The basic intuition of SGD was introduced by Robbins and Monro in 1951 in a context different from neural networks [6]. Only in 2012 – or 52 years after the first time backpropagation was first introduced – AlexNet [7] achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge using GPUs. According to The Economist [8], "Suddenly people started to pay attention, not just within the AI community...