Optimizing Performance and VRAM Usage
In the previous chapters, we covered the theory behind the Stable Diffusion models, introduced the Stable Diffusion model data format, and discussed conversion and model loading. Even though the Stable Diffusion model conducts denoising in the latent space, by default, the model’s data and execution still require a lot of resources and may throw a CUDA Out of memory
error from time to time.
To enable fast and smooth image generation using Stable Diffusion, there are some techniques to optimize the overall process, boost the inference speed, and also reduce VRAM usage. In this chapter, we are going to cover the following optimization solutions and discuss how well these solutions work in practice:
- Using float16 or bfloat16 data type
- Enabling VAE tiling
- Enabling Xformers or using PyTorch 2.0
- Enabling sequential CPU offload
- Enabling model CPU offload
- Token merging (ToMe)
By using some of these solutions...