Stable Diffusion for text-to-image generation
Text-to-image generation is a widely adopted use case of generative AI. Generating images from text, especially high-quality images, has lots of use cases from game design to marketing. However, in order to understand how this specific kind of model works, we need to first understand a few preliminaries about diffusion models in machine learning.
Diffusion in AI is a term borrowed from physics. The notion of physical reactions that include dissolved materials such as ink in water also applies here to AI. For example, take an ordinary image as our starting point. Forward diffusion is the process of adding noise to the image. As seen in the following figure, this process will turn any image into noise (with a level of noise added to the image) that gradually makes the image indistinguishable from the original one.
Figure 17.2 – Forward diffusion
This forward process will give us a different version...