Implementing a text-guided image-to-image Stable Diffusion inference pipeline
The only thing we need to do now is concatenate the starting image with the starting latent noise. The latents_input
Torch tensor is the latent we encoded from a dog image earlier in this chapter:
strength = 0.7 # scale the initial noise by the standard deviation required by the # scheduler latents = latents_input*(1-strength) + Â Â Â Â noise_tensor*scheduler.init_noise_sigma
That is all that is necessary; use the same code from the text-to-image pipeline, and you should generate something like Figure 5.4:
Figure 5.4: A running dog, generated by a custom image-to-image Stable Diffusion pipeline
Note that the preceding code uses strength = 0.7
; the strength denotes the weight of the original latent noise. If you want an image more similar to the initial image (the image you provided to the image-to-image pipeline), use a lower strength number; otherwise...