The goal of the loss function is to evaluate how well the network, with its current weights, is performing. More formally, this function expresses the quality of the predictions as a function of the network's parameters (such as its weights and biases). The smaller the loss, the better the parameters are for the chosen task.
Since loss functions represent the goal of networks (return the correct labels, compress the image while preserving the content, and so on), there are as many different functions as there are tasks. Still, some loss functions are more commonly used than others. This is the case for the sum-of-squares function, also called L2 loss (based on the L2 norm), which is omnipresent in supervised learning. This function simply computes the squared difference between each element of the output vector y (the per-class probabilities estimated by our network) and each element of the ground truth vector ytrue (the target vector with null values for every...