In the previous chapters, we have been mainly dealing with image synthesis and image-to-image translation tasks. Now, it's time for us to move from the CV field to the NLP field and discover the potential of GANs in other applications. Perhaps you have seen some CNN models being used for image/video captioning. Wouldn't it be great if we could reverse this process and generate images from description text?
In this chapter, you will learn about the basics of word embeddings and how are they used in the NLP field. You will also learn how to design a text-to-image GAN model so that you can generate images based on one sentence of description text. Finally, you will understand how to stack two or more Conditional GAN models to perform text-to-image synthesis with much higher resolution with StackGAN and StackGAN++.
The following topics...