Analysis of Classifier-Free Guidance Weight Schedulers

Authors: Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernandez Abrevaya, David Picard, Vicky Kalogeiton

What

This paper investigates the use of dynamic weight schedulers in Classifier-Free Guidance (CFG) for diffusion models, showing that these schedulers can improve image fidelity, diversity, and textual adherence compared to static CFG.

Why

This paper is important because it provides a comprehensive analysis of dynamic guidance weight schedulers in CFG, which is a widely used technique for conditional diffusion models. The findings provide practical guidance for practitioners to improve the performance of their diffusion models with simple modifications.

How

The authors conducted experiments on various tasks, including class-conditioned image generation and text-to-image generation, using datasets like CIFAR-10, ImageNet, and LAION. They evaluated different heuristic and parameterized dynamic schedulers, comparing their performance against static CFG using metrics like FID, Inception Score, CLIP-Score, and diversity measures. They also performed a user study to assess the perceptual quality of generated images.

Result

Key findings include: (1) monotonically increasing weight schedulers (e.g., linear and cosine) consistently improve performance over static CFG; (2) a simple linear scheduler significantly enhances results without additional computational cost or parameter tuning; (3) parameterized schedulers can further improve performance but require tuning for each model and task.

LF

The authors acknowledge that the optimal parameters for parameterized schedulers do not generalize across different models and tasks. Future work could focus on developing more adaptable and robust parameterized schedulers. Another direction is to investigate the theoretical underpinnings of why dynamic schedulers work better than static CFG, leading to more principled design of these schedulers.

Abstract

Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models. It operates by combining the conditional and unconditional predictions using a fixed weight. However, recent works vary the weights throughout the diffusion process, reporting superior results but without providing any rationale or analysis. By conducting comprehensive experiments, this paper provides insights into CFG weight schedulers. Our findings suggest that simple, monotonically increasing weight schedulers consistently lead to improved performances, requiring merely a single line of code. In addition, more complex parametrized schedulers can be optimized for further improvement, but do not generalize across different models and tasks.