Fake samples generated with the GANs (Goodfellow et al., 2014) framework have fooled humans and machines into believing that they are indistinguishable from real samples. Although this might be true for the naked eye and the discriminator fooled by the generator, it is unlikely that fake samples are numerically indistinguishable from real samples. Inspired by formal methods, this paper focuses on the evaluation of fake samples with respect to statistical summaries and formal specifications computed on the real data.
Since the Generative Adversarial Networks paper (Goodfellow et al., 2014), most GAN-related publications use a grid of image samples to accompany theoretical and empirical results. Unlike Variational Autoencoders (VAEs) and other models (Goodfellow et al., 2014), most of the evaluation of the output of GAN-trained Generators is qualitative:...