In this section, we'll learn about the output activation function known as softmax. We'll be taking a look at how it relates to output classes, as well as learning about how softmax generates probability.
Let's take a look! When we're building a classifier, the neural network is going to output a stack of numbers, usually an array with one slot corresponding to each of our classes. In the case of the model we're looking at here, it's going to be digits from zero to nine. What softmax does is smooth out a big stack of numbers into a set of probability scores that all sum up to one:
This is important so that you can know which answer is the most probable. So, as an example that we can use to understand softmax, let's look at our array of values. We can see that there are three values. Let's assume that the neural...