Technical requirements
Before diving into the BLIP-2 and LLaVA, let’s use Stable Diffusion to generate an image for testing.
First, load up a deliberate-v2
model without sending it to CUDA:
import torch from diffusers import StableDiffusionPipeline text2img_pipe = StableDiffusionPipeline.from_pretrained( "stablediffusionapi/deliberate-v2", torch_dtype = torch.float16 )
Next, in the following code, we first send the model to CUDA and generate an image, then we offload the model to CPU RAM, and clear the model out from CUDA:
text2img_pipe.to("cuda:0") prompt ="high resolution, a photograph of an astronaut riding a horse" input_image = text2img_pipe( prompt = prompt, generator = torch.Generator("cuda:0").manual_seed(100), height = 512, width = 768 ).images[0] text2img_pipe.to("cpu...