In the previous chapter, we learned how to generate high-quality images based on description text with GANs. Now, we will move on and look at sequential data synthesis, such as text and audio, using various GAN models.
When it comes to the generation of text, the biggest difference in terms of image generation is that text data is discrete while image pixel values are more continuous, though digital images and text are both essentially discrete. A pixel typically has 256 values and slight changes in the pixels won't necessarily affect the image's meaning to us. However, a slight change in the sentence – even a single letter (for example, turning we into he) – may change the whole meaning of the sentence. Also, we tend to have a higher tolerance bar for synthesized images compared to text...