One-step Diffusion with Distribution Matching Distillation

Authors: Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park

What

This paper introduces Distribution Matching Distillation (DMD), a method for converting a diffusion model into a one-step image generator with minimal quality loss by minimizing the KL divergence between real and generated image distributions using a pair of diffusion models.

Why

This paper is important because it addresses the slow sampling speed of diffusion models, enabling near-real-time image generation with quality comparable to traditional multi-step methods.

How

The authors train a one-step generator with a distribution matching loss, estimated from scores derived from two diffusion models, and a regression loss based on a pre-computed dataset of noise-image pairs from the original diffusion model.

Result

DMD outperforms existing diffusion distillation techniques, achieving FIDs of 2.62 on ImageNet 64x64 and 11.49 on zero-shot COCO-30k, comparable to Stable Diffusion but significantly faster (20 FPS).

LF

Limitations include a minor quality gap compared to multi-step diffusion and challenges in generating text and fine details. Future work involves distilling more advanced models and exploring variable guidance scales.

Abstract

Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.