Text to image
Text-to-image GANs are conditional GANs. However, instead of using class labels as conditions, they use words as the condition to generate images. In earlier practice, GANs used word embeddings as the conditions into the generator and discriminator. Their architectures are similar to conditional GANs, which we learned about in Chapter 4, Image-to-Image Translation. The difference is merely that the embedding of text is generated using a natural language processing (NLP) preprocessing pipeline. The following diagram shows the architecture of a text-conditional GAN:
Figure 10.5 – Text-conditional convolutional GAN architecture where text encoding is used by both the generator and discriminator (Redrawn from: S. Reed et al., 2016, "Generative Adversarial Text to Image Synthesis," https://arxiv.org/abs/1605.05396)
Like normal GANs, generated high-resolution images tend to be blurry. StackGAN resolves this by stacking two networks...