Text to image
Text-to-image GANs are conditional GANs. However, instead of using class labels as conditions, they use words as the condition to generate images. In earlier practice, GANs used word embeddings as the conditions into the generator and discriminator. Their architectures are similar to conditional GANs, which we learned about in Chapter 4, Image-to-Image Translation. The difference is merely that the embedding of text is generated using a natural language processing (NLP) preprocessing pipeline. The following diagram shows the architecture of a text-conditional GAN:
Like normal GANs, generated high-resolution images tend to be blurry. StackGAN resolves this by stacking two networks...