5. Backpropagation
The previous chapter described neural network training. There, the gradient of a weight parameter in the neural network (i.e., the gradient of the loss function for a weight parameter) was obtained by using numerical differentiation. Numerical differentiation is simple, and its implementation is easy, but it has the disadvantage that calculation takes time. This chapter covers backpropagation, which is a more efficient way to calculate the gradients of weight parameters.
There are two ways to understand backpropagation correctly. One of them uses "equations," while the other uses computational graphs. The former is a common way, and many books about machine learning expand on this by focusing on formulas. This is good because it is strict and simple, but it may hide essential details or end in a meaningless list of equations.
Therefore, this chapter will use computational graphs so that you can understand backpropagation "visually."...