Optimization solution 5 – enabling model CPU offload
Full model offloading moves the whole model data to and off GPU instead of moving weights only. If this is not enabled, all model data will stay in GPU before and after forward inference; clearing the CUDA cache won’t free up VRAM either. This could lead to a CUDA Out of memory
error if you are loading up other models, say, an upscale model to further process the image. The model-to-CPU offload method can mitigate the CUDA Out of
memory
problem.
Based on the idea behind this method, an additional one to two seconds will be spent on moving the model between CPU RAM and GPU VRAM.
To enable this method, remove pipe.to("cuda")
and add pipe.enable_model_cpu_offload()
:
import torch from diffusers import StableDiffusionPipeline text2img_pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype = torch.float16 )...