Now, let's go ahead and build our model. So, we have 10 classes in our dataset 0-9 and the goal is to classify any input image into one of these classes. Instead of giving a hard decision about the input image by saying only which class it could belong to, we are going to produce a vector of 10 possible values (because we have 10 classes). It'll represent the probabilities of each digit from 0-9 being the correct class for the input image.
For example, suppose we feed the model with a specific image. The model might be 70% sure that this image is 9, 10% sure that this image is 8, and so on. So, we are going to use the softmax regression here, which will produce values between 0 and 1.
A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that...