Overcoming 77-Token Limitations and Enabling Prompt Weighting
From Chapter 5, we know that Stable Diffusion utilizes OpenAI’s CLIP model as its text encoder. The CLIP model’s tokenization implementation, as per the source code [6], has a context length of 77 tokens.
This 77-token limit in the CLIP model extends to Hugging Face Diffusers, restricting the maximum input prompt to 77 tokens. Unfortunately, it’s not possible to assign keyword weights within these input prompts due to this constraint without some modifications.
For instance, let’s say you give a prompt string that produces more than 77 tokens, like this:
from diffusers import StableDiffusionPipeline import torch pipe = StableDiffusionPipeline.from_pretrained( "stablediffusionapi/deliberate-v2", torch_dtype=torch.float16).to("cuda") prompt = "a photo of a cat and a dog driving an aircraft "*20 image = pipe(prompt...