SDXL-Lightning: Progressive Adversarial Diffusion Distillation
Authors: Shanchuan Lin, Anran Wang, Xiao Yang
What
This paper introduces SDXL-Lightning, a novel diffusion distillation method that achieves state-of-the-art performance in one-step/few-step 1024px text-to-image generation based on SDXL.
Why
This paper is important because it addresses the limitations of existing diffusion models in generating high-quality images with few inference steps, offering a significant speed and computational advantage over previous methods.
How
The authors propose a progressive adversarial diffusion distillation method. The approach combines progressive distillation with an adversarial loss function and uses a pre-trained diffusion UNet encoder as the discriminator backbone, enabling efficient distillation in latent space. The method progressively distills the model from 128 steps to 1 step, using both conditional and unconditional adversarial objectives to balance image quality and mode coverage.
Result
The resulting SDXL-Lightning models achieve state-of-the-art performance in one-step/few-step 1024px text-to-image generation, exceeding the quality of previous methods like SDXL-Turbo and LCM. The models demonstrate superior high-resolution detail preservation while maintaining comparable text alignment and diversity. Notably, they even surpass the original SDXL model in quality for 4-step and 8-step generation.
LF
The paper acknowledges limitations, including the need for separate checkpoints for different inference steps and the potential for further improvement in the UNet architecture for one-step generation. Future work could explore distilling models with multiple aspect ratios and researching optimal architectures for one-step generation.
Abstract
We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.