ControlNet
Imagine a scenario where we want the subject of an image to have a certain pose that we prescribe it to have – ControlNet helps us to achieve that. In this section, we will learn about how to leverage a diffusion model and modify the architecture of ControlNet and achieve this objective.
Architecture
ControlNet works as follows:
- We take human images and pass them through the OpenPose model to get stick figures (keypoints) corresponding to the image. The OpenPose model is a pose detector that is very similar to the human pose detection model that we explored in Chapter 10.
- The inputs to the model are a stick figure and a prompt corresponding to the image, and the expected output is the original human image.
- We create a replica of the downsampling blocks of the UNet2DConditionModel (the copies of the downsampling blocks are shown in Figure 17.5).
- The replica blocks are passed through a zero-convolution layer...