How can we update the network parameters so that they minimize the loss? For each parameter, what we need to know is how slightly changing its value would affect the loss. If we know which changes would slightly decrease the loss, then it is just a matter of applying these changes and repeating the process until reaching a minimum. This is exactly what the gradient of the loss function expresses, and what the gradient descent process is.
At each training iteration, the derivatives of the loss with respect to each parameter of the network are computed. These derivatives indicate which small changes to the parameters need to be applied (with a -1 coefficient since the gradient indicates the direction of increase of the function, while we want to minimize it). It can be seen as walking step by step down the slope of the loss function with respect to each parameter, hence the name gradient descent for this iterative process (refer to the following diagram...