In this example, we improve the baseline model without doing any modifications to the architecture. The authors propose changing the optimization problem such that the Discriminator also has access to mismatched pairs of text embeddings and images.
This approach is called the Matching-Aware Discriminator and is designed to separate the error sources in this task. During training, the discriminator has access to real images with proper text and synthetic images with arbitrary text. In this context, the discriminator implicitly has two sources of error: fake images that look real but do not match the text description, and unrealistic images for any text.
In this context, the authors explicitly provide the discriminator with pairs of real images and unmatched texts, and empirically find that this helps during training. We'll provide a slice of the...