Introducing diffusion models
In the previous GAN chapters, we learned about generating images from noise; we also learned about generating images from conditional input like the class of images that should be generated. However, in that scenario, we were able to get the image of a face straight away from random noise. This is a step change. What if we could generate an image from random noise in a more incremental way? For example, what if we could gradually generate the contour corresponding to the image initially and then slowly get the finer details of the image from the contours over increasing epochs? Further, what if we could generate an image from text input? Diffusion models come in handy in such a scenario.
A diffusion model mimics the scenario of a diffusion process, which refers to the gradual spread or dispersion of a quantity (in this case, pixel values in an image) over time.
How diffusion models work
Imagine a scenario where you have a set of images. In...