The back-propagation algorithm
This algorithm is more of a methodology than an actual algorithm. In fact, it was designed to be flexible enough to adapt to any kind of neural architecture without any substantial changes. Therefore, in this section we'll define the main concepts without focusing on a particular case. Those who are interested in implementing it will be able to apply the same techniques to different kinds of networks with minimal effort (assuming that all requirements are met).
The goal of a training process using a deep learning model is normally achieved by minimizing a cost function. Let's suppose we have a network parameterized with a global vector . In that case, the cost function (using the same notation for loss and cost but with different parameters to disambiguate) is defined as follows:
We've already explained that the minimization of the previous expression (which is the empirical risk) is a way to minimize the real expected risk and...