Editing Massive Concepts in Text-to-Image Diffusion Models

Authors: Tianwei Xiong, Yue Wu, Enze Xie, Yue Wu, Zhenguo Li, Xihui Liu

What

This paper introduces EMCID, a two-stage method for editing large numbers of concepts in text-to-image diffusion models, addressing issues like outdated information, biases, and copyright infringement.

Why

The paper is important because it offers a practical solution to mitigate problematic content generation in large diffusion models, which is crucial for their safe and responsible deployment in real-world applications.

How

EMCID first optimizes individual concept representations in the text encoder using dual self-distillation from text alignment and noise prediction losses. The second stage then aggregates these optimized representations and edits multiple layers of the model using a closed-form solution.

Result

EMCID demonstrates superior scalability compared to previous methods, successfully editing up to 1,000 concepts while preserving the model’s generation quality. It excels in updating, erasing, and rectifying concepts, as evidenced by extensive evaluations on the proposed ImageNet Concept Editing Benchmark (ICEB) and other benchmarks.

LF

The authors acknowledge that EMCID might not effectively eliminate NSFW content generation, particularly from prompts with low toxicity. Future work could focus on addressing this limitation, potentially by combining EMCID with methods targeting other parts of the diffusion model.

Abstract

Text-to-image diffusion models suffer from the risk of generating outdated, copyrighted, incorrect, and biased content. While previous methods have mitigated the issues on a small scale, it is essential to handle them simultaneously in larger-scale real-world scenarios. We propose a two-stage method, Editing Massive Concepts In Diffusion Models (EMCID). The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing. We further propose a comprehensive benchmark, named ImageNet Concept Editing Benchmark (ICEB), for evaluating massive concept editing for T2I models with two subtasks, free-form prompts, massive concept categories, and extensive evaluation metrics. Extensive experiments conducted on our proposed benchmark and previous benchmarks demonstrate the superior scalability of EMCID for editing up to 1,000 concepts, providing a practical approach for fast adjustment and re-deployment of T2I diffusion models in real-world applications.