Whenever we evaluate a network, we are actually interested in our accuracy of classifying images in the test set. This remains true for any ML model, as our accuracy on the training set is not a reliable indicator of our model's generalizability.
In our case, the test set accuracy is 95.78%, which is marginally lower than our training set accuracy of 96%. This is a classic case of overfitting, where our model seems to have captured irrelevant noise in our data to predict the training images. Since that inherent noise is different on our randomly selected test set, our network couldn't rely on the useless representations it had previously picked up on, and so performed poorly during testing. As we will see throughout this book, when testing neural networks, it is important to ensure that it has learnt correct and efficient representations...