From Chapter 4, Building Your First GAN with PyTorch, to Chapter 8, Training Your GANs to Break Different Models, we have learned almost every basic application of GANs in computer vision, especially when it comes to image synthesis. You're probably wondering how GANs are used in other fields, such as text or audio generation. In this chapter, we will gradually move from CV to NLP by combining the two fields together and try to generate realistic images from description text. This process is called text-to-image synthesis (or text-to-image translation).
We know that almost every GAN model generates synthesized data by establishing a definite mapping from a certain form of input data to the output data. Therefore, in order to generate an image from a corresponding description sentence, we need to understand how to represent sentences with...