Unified Concept Editing in Diffusion Models
Authors: Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau
What
This paper introduces Unified Concept Editing (UCE), a closed-form model editing method for text-to-image diffusion models that can erase, moderate, and debias multiple concepts simultaneously without retraining.
Why
This work addresses limitations in existing methods that handle bias, copyright, and offensive content separately in text-to-image models. UCE provides a unified, efficient, and scalable solution to tackle these issues concurrently, paving the way for safer and more responsible deployment of these models.
How
UCE builds upon prior model editing techniques like TIME and MEMIT, generalizing their closed-form weight update solutions for linear projection layers in diffusion models. By directly modifying cross-attention weights, it aligns text embeddings to manipulate concept generation. The method employs different target output strategies for each edit type: erasing associates concepts with different outputs, debiasing adjusts attribute magnitudes, and moderation replaces outputs with generic responses.
Result
UCE demonstrates superior performance in erasing artistic styles while minimizing interference with unrelated concepts, outperforming baselines like ESD and Concept Ablation. It effectively debiases gender and racial biases in profession representations, surpassing existing methods in achieving balanced attribute distributions. Additionally, UCE exhibits comparable or better NSFW content moderation capabilities compared to ESD, while maintaining higher image quality and text-image alignment.
LF
The authors acknowledge limitations in addressing compounding biases when debiasing across multiple attributes, as well as challenges posed by compositional bias effects in prompts. They also note that excessive artistic style erasures can degrade overall model performance, suggesting a need to preserve a critical mass of artistic knowledge. Future work could focus on mitigating these limitations, exploring joint attribute debiasing, and developing techniques to handle compositional bias.
Abstract
Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution, and scales seamlessly to concurrent edits on text-conditional diffusion models. We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections, and we present extensive experiments demonstrating improved efficacy and scalability over prior work. Our code is available at https://unified.baulab.info