Least-squares GAN (LSGAN)
As discussed in the previous section, the original GAN is difficult to train. The problem arises when the GAN optimizes its loss function; it's actually optimizing the Jensen-Shannon divergence, DJS. It is difficult to optimize DJS when there is little to no overlap between two distribution functions.
WGAN proposed to address the problem by using the EMD or Wasserstein 1 loss function which has a smooth differentiable function even when there is little or no overlap between the two distributions. However, WGAN is not concerned with the generated image quality. Apart from stability issues, there are still areas of improvement in terms of perceptive quality in the generated images of the original GAN. LSGAN theorizes that the twin problems can be solved simultaneously.
LSGAN proposes the least squares loss. Figure 5.2.1 demonstrates why the use of a sigmoid cross entropy loss in the GAN results in poorly generated data quality. Ideally, the fake samples distribution...