So far, we've used neural networks as discriminative models. This simply means that, given input data, a discriminative model will map it to a certain label (in other words, a classification). A typical example is the classification of MNIST images in 1 of 10 digit classes, where the neural network maps input data features (pixel intensities) to the digit label. We can also say this in another way: a discriminative model gives us the probability of (class), given (input). In the case of MNIST, this is the probability of the digit when given the pixel intensities of the image.
On the other hand, a generative model learns how classes are distributed. You can think of it as the opposite of what the discriminative model does. Instead of predicting the class probability, , given certain input features, it tries to predict the...