Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Authors: Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet

What

This paper introduces DRaFT, a family of methods for efficiently fine-tuning diffusion models to maximize differentiable reward functions, such as human preference scores, through backpropagation through the sampling process, leading to improved generation quality.

Why

This paper is important because it offers a more efficient and scalable alternative to reinforcement learning for aligning diffusion model outputs with human preferences or other complex objectives, which is crucial for deploying these models in real-world applications.

How

The authors propose DRaFT, which backpropagates reward gradients through the sampling chain, using LoRA and gradient checkpointing for efficiency. They introduce two variants: DRaFT-K, truncating backpropagation to the last K steps, and DRaFT-LV, reducing gradient variance for K=1. They evaluate their methods on Stable Diffusion 1.4 using various reward functions, including aesthetic scores, human preferences (PickScore, HPSv2), and tasks like image compressibility and adversarial example generation.

Result

DRaFT significantly outperforms RL methods in sample efficiency for maximizing aesthetic scores. DRaFT-LV achieves the best reward value on the HPSv2 benchmark, learning faster than other methods. The authors demonstrate the effectiveness of DRaFT on various tasks like generating compressible/incompressible images, manipulating object presence using object detectors, and creating adversarial examples. They also show that LoRA scaling allows for controlling the strength of fine-tuning and combining models trained with different rewards.

LF

The paper acknowledges the issue of reward hacking, where models exploit reward function limitations. Future work could explore addressing reward hacking and developing more robust reward functions. The authors also point to improving text alignment using powerful image captioning models as a potential research direction.

Abstract

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.