For the dataset we selected for this demonstration, the discriminator was becoming very good at classifying the real and fake images, and therefore not providing much of the feedback in terms of gradients to the generator. Hence we had to make the discriminator weak with the following best practices:
- The learning rate of the discriminator is kept much higher than the learning rate of the generator.
- The optimizer for the discriminator is GradientDescent and the optimizer for the generator is Adam.
- The discriminator has dropout regularization while the generator does not.
- The discriminator has fewer layers and fewer neurons as compared to the generator.
- The output of the generator is tanh while the output of the discriminator is sigmoid.
- In the Keras model, we use a value of 0.9 instead of 1.0 for labels of real data and we use 0.1...