Elucidating the Exposure Bias in Diffusion Models
Authors: Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, Itir Onal Ertugrul
What
This paper investigates the exposure bias problem in diffusion models, where the input mismatch between training and sampling leads to error accumulation and sampling drift. The paper analyzes the sampling distribution with prediction error, proposes a metric for quantifying exposure bias, and introduces Epsilon Scaling, a training-free method for alleviating this issue by scaling down the network output during sampling.
Why
The paper is important because it provides an in-depth analysis of the exposure bias problem in diffusion models, which is a key factor affecting sample quality, especially in fast sampling scenarios. The proposed Epsilon Scaling method offers a simple yet effective solution to improve sample quality without retraining, making it widely applicable across different diffusion model architectures and samplers.
How
The authors first analytically model the sampling distribution by considering the prediction error. Then, they propose a metric (variance error) to quantify the exposure bias at each timestep. To address the exposure bias issue, they propose Epsilon Scaling, a training-free method that scales down the network output (epsilon) during sampling based on a linear schedule derived from the accumulated error. The authors evaluate their method using FID scores on various datasets (CIFAR-10, LSUN, FFHQ, ImageNet) and diffusion frameworks (ADM, DDIM, DDPM, EDM, LDM, DiT, PFGM++).
Result
Epsilon Scaling consistently improves FID scores across various diffusion frameworks, datasets, and conditional settings. For instance, ADM-ES obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation, outperforming previous state-of-the-art stochastic samplers. Epsilon Scaling is shown to effectively reduce exposure bias by moving the sampling trajectory closer to the vector field learned during training. The method exhibits insensitivity to the scaling parameter, requiring minimal effort to search for an optimal value.
LF
The authors acknowledge that Epsilon Scaling corrects only the magnitude error of the network prediction, not the direction error, implying there is still room for improvement. Future work could focus on exploring methods to further reduce the exposure bias by addressing the direction error. Another avenue for future work is investigating the effectiveness of Epsilon Scaling on other diffusion-based applications beyond image generation, such as audio and video generation.
Abstract
Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.