Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Authors: Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau
What
This paper introduces Concept Sliders, a method for fine-tuning diffusion models using low-rank adaptations (LoRA) to enable precise and interpretable control over image attributes.
Why
This work is significant because it addresses limitations of existing diffusion model editing techniques by providing: 1) fine-grained control over continuous attributes, 2) composability for multi-attribute editing, 3) ability to learn visual concepts from image pairs, 4) transfer of style latents from GANs, and 5) improvement of image quality by fixing common distortions.
How
The authors train LoRA adaptors using a guided score function that encourages the generation of images with desired attributes while preserving unrelated features. They use text prompt pairs, image pairs, and StyleGAN latents to define concepts and train the sliders. They evaluate their method on Stable Diffusion XL and SD v1.4, measuring CLIP score change, LPIPS distance, and conducting user studies to assess image quality.
Result
Key findings include: 1) Concept Sliders enable precise control over various attributes, 2) image-based sliders effectively capture visual concepts, 3) StyleGAN latents can be transferred to diffusion models for nuanced style editing, and 4) sliders can fix hand distortions and enhance overall realism, as confirmed by user studies.
LF
Limitations include residual interference between edits and a potential trade-off between edit strength and structural coherence when using the SDEdit technique. Future work could explore automated methods for minimizing interference and improving edit strength without sacrificing image structure.
Abstract
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/