Building Optimized Prompts for Stable Diffusion
In Stable Diffusion V1.5 (SD V1.5), crafting prompts to generate ideal images can be challenging. It is not uncommon to see impressive images emerge from complex and unusual word combinations. This is largely due to the language text encoder used in Stable Diffusion V1.5 – OpenAI’s CLIP model. CLIP is trained using captioned images from the internet, many of which are tags rather than structured sentences.
When using SD v1.5, we must not only memorize a plethora of “magical” keywords but also combine these tagging words effectively. For SDXL, its dual-language encoders, CLIP and OpenCLIP, are much more advanced and intelligent than those in the previous SD v1.5. However, we still need to follow certain guidelines to write effective prompts.
In this chapter, we will cover the fundamental principles for creating dedicated prompts and then explore powerful large language model (LLM) techniques to help us...