So far, we've used neural networks as discriminative models. This simply means that given input data, a discriminative model will map it to a certain label (in other words, a classification). A typical example is the classification of MNIST images in 1 of 10 digit classes, where the neural network maps the input data features (pixel intensities) to the digit label. We can also say this in another way, a discriminative model gives us the probability of (class), given (input) . In the MNIST case, this is the probability of the digit, given the pixel intensities of the image.
On the other hand, a generative model learns the distribution of the classes. You can think of it as the opposite of what the discriminative model does. Instead of predicting the class probability, , given certain input features, it tries to predict the...