Building a custom scheduled prompt pipeline
As we discussed in Chapter 5, the generation process utilizes input prompt embedding to denoise an image at each step. By default, every denoising step employs the exact same embedding. However, to gain more precise control over the generation, we can modify the pipeline code to supply unique embeddings for each denoising step.
Take, for instance, the following prompt:
[A photo of cat:A photo of dog:0.5]
During a total of 10 denoising steps, we hope the pipeline can remove noise in the first 5 steps to reveal A photo of cat
, and the following 5 steps to reveal A photo of dog
. To make this happen, we will need to implement the following components:
- A prompt parser capable of extracting the scheduling number from the prompt
- A method to embed the prompts and create a list of prompt embeddings that matches the number of steps
- A new
pipeline
class derived from the Diffusers pipeline, enabling us to incorporate our new...