How does LoRA work?
LoRA is a technique for quickly fine-tuning diffusion models, first introduced by Microsoft researchers in a paper by Edward J. Hu et al [1]. It works by creating a small, low-rank model that is adapted for a specific concept. This small model can be merged with the main checkpoint model to generate images similar to the ones used to train LoRA.
Let’s use W to denote the original UNet attention weights (Q,K,V), ΔW to denote the fine-tuned weights from LoRA, and W′ as the merged weights. The process of adding LoRA to a model can be expressed like this:
W′= W + ΔW
If we want to control the scale of LoRA weights, we denote the scale as α. Adding LoRA to a model can be expressed like this now:
W′= W + αΔW
The range of α can be from 0
to 1.0
[2]. It should be fine if we set α slightly larger than 1.0
. The reason why LoRA is so small is that ΔW can be represented by two small matrices...