Training a CNN
During the training of a CNN, the model tries to learn the weights of the filters in feature extraction and the weights at the fully connected layers in the neural network. To understand how a model is trained, we'll discuss how the probability of each output class is calculated, how we calculate the error or the loss, and finally, how we optimize or minimize that loss while updating the weights:
- Probabilities
Recall that in the last layer of the neural network section, we used a softmax function to calculate the probability of each output class. This probability is calculated by dividing the exponent of that class score by the sum of the exponents of all scores:
Figure 4.12: Expression to calculate probability
- Loss
We need to be able to quantify how well the calculated probabilities predict the actual class. This is done by calculating a loss, which in the case of classification probability is best done through the categorical cross-entropy loss function. The categorical...