Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Authors: Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
What
This paper introduces AlignProp, a novel method for aligning text-to-image diffusion models with specific reward functions using end-to-end backpropagation through the denoising process, overcoming memory constraints with techniques like LoRA and gradient checkpointing.
Why
This work is important because it provides a more efficient and effective way to adapt pre-trained diffusion models for specific downstream tasks that require optimizing for objectives like aesthetics, semantic alignment, or ethical image generation, which are difficult to achieve with standard training methods.
How
The authors frame denoising inference as a differentiable recurrent policy and train it using end-to-end backpropagation of gradients from a reward function. To handle memory issues, they fine-tune low-rank adapter (LoRA) weights and employ gradient checkpointing. To prevent overfitting to the reward function, they introduce randomized truncated backpropagation through time.
Result
AlignProp achieves higher reward scores and converges faster than reinforcement learning baselines like DDPO. It also demonstrates better generalization to new prompts and is preferred by human evaluators for fidelity and image-text alignment. The paper shows that mixing weights of models finetuned on different reward functions allows for interpolation between these objectives.
LF
The authors acknowledge the limitation of potential over-optimization when the reward function is imperfect and suggest that mitigating this risk is an area for future work. Additionally, extending AlignProp to diffusion-based language models for improved alignment with human feedback is another promising direction.
Abstract
Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/.