CAT: Contrastive Adapter Training for Personalized Image Generation
Authors: Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song
What
This paper introduces CAT (Contrastive Adapter Training), a method for personalized image generation using diffusion models that leverages a contrastive loss function to preserve the base model’s knowledge while training adapters, improving upon existing methods like LoRA and Dreambooth.
Why
The paper addresses the limitations of current personalized image generation techniques, which often lead to knowledge corruption and underfitting in diffusion models, by proposing a novel training pipeline that combines contrastive learning with adapter training, resulting in better preservation of the original model’s capabilities and more diverse and controllable generation.
How
The authors propose CAT, which adds a contrastive loss term to the adapter training objective. This loss encourages the adapted model’s noise predictions to be similar to the original model’s predictions when no trigger token is present, ensuring the preservation of the base model’s knowledge. The method is evaluated using established metrics like prompt similarity and identity similarity, alongside a newly introduced metric called Knowledge Preservation Score (KPS) to quantify knowledge retention.
Result
CAT outperforms existing adapter training methods in preserving the original model’s knowledge while achieving comparable identity generation fidelity. This is demonstrated through quantitative results using metrics like KPS and qualitative comparisons of generated images, showcasing CAT’s ability to maintain diversity and avoid mode collapse.
LF
The paper acknowledges limitations in evaluating diversity and fidelity due to the instability of CLIP-based scores and the lack of investigation into the impact of domain discrepancies between the model and training data. Future work aims to establish a reliable benchmark for consistent character generation, explore the impact of CAT’s structure and application more thoroughly, and expand CAT to support multi-concept training with per-token loss for enhanced multi-concept generation.
Abstract
The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model’s prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model’s original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT’s ability to keep the former information. We qualitatively and quantitatively compare CAT’s improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.