Now we will see another very interesting version of a GAN, called Wasserstein GAN (WGAN). It uses the Wasserstein distance in the GAN's loss function. First, let's understand why we need a Wasserstein distance measure and what's wrong with our current loss function.
Before going ahead, first, let's briefly explore two popular divergence measures that are used for measuring the similarity between two probability distributions.
The Kullback-Leibler (KL) divergence is one of the most popularly used measures for determining how one probability distribution diverges from the other. Let's say we have two discrete probability distributions, and , then the KL divergence can be expressed as follows:
When the two distributions are continuous, then the KL divergence can be expressed in the integral form as shown:
The KL divergence...