Adversarial Diffusion Distillation
Authors: Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach
What
This paper introduces Adversarial Diffusion Distillation (ADD), a novel approach for training diffusion models that generates high-quality images in just 1-4 sampling steps by combining adversarial training with score distillation from a pre-trained diffusion model.
Why
This paper is important because it addresses the limitations of current diffusion models, particularly their slow inference speed due to the iterative sampling process, and offers a method for achieving real-time, high-quality image synthesis using foundation models.
How
The authors train a student diffusion model using a hybrid loss function consisting of two components: an adversarial loss that forces the model to generate realistic images and a score distillation loss that leverages the knowledge of a pre-trained teacher diffusion model. The model is trained to generate images from noisy inputs at various timesteps, using the same diffusion coefficients as the student model.
Result
ADD outperforms existing few-step methods like Latent Consistency Models (LCMs) and GANs in single-step image synthesis. Notably, with four sampling steps, ADD-XL surpasses the performance of its teacher model, SDXL-Base, demonstrating its capability to generate high-fidelity images efficiently.
LF
The authors acknowledge the potential for exploring different distillation weighting functions and scheduling strategies for further performance improvement. Future work could also involve investigating the application of ADD to other domains such as video and 3D generation.
Abstract
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models. Code and weights available under https://github.com/Stability-AI/generative-models and https://huggingface.co/stabilityai/ .