Understanding Classifier Guidance denoising
Until now, we haven’t talked about the text guidance yet. The image generation process will take a random Gaussian noise as the only input, and then randomly generate an image based on the training dataset. But we want a guided image generation; for example, input “dog” to ask the diffusion model to generate an image including “dog.”
In 2021, Dhariwal and Nichol, from OpenAI, proposed classifier guidance in their paper titled Diffusion Models Beat GANs on Image Synthesis [12].
Based on the proposed methodology, we can achieve classifier-guided denoising by providing a classification label during the training stage. Instead of just image or time-step embedding, we also provide text description embeddings as shown in Figure 4.9.
Figure 4.9: Train a diffusion model with conditional text
In Figure 4.7, there are two inputs, while in Figure 4.9, there is one additional input...