On Mechanistic Knowledge Localization in Text-to-Image Generative Models
Authors: Samyadeep Basu, Keivan Rezaei, Priyatham Kattakinda, Ryan Rossi, Cherry Zhao, Vlad Morariu, Varun Manjunatha, Soheil Feizi
What
This paper investigates the localization of knowledge within text-to-image generative models, particularly focusing on identifying specific layers responsible for controlling visual attributes like “style”, “objects”, and “facts”.
Why
This work is crucial as it offers a deeper understanding of how knowledge is represented within these complex models, facilitating efficient model editing techniques for tasks like removing specific styles, modifying objects, or updating factual information.
How
The authors first analyze the effectiveness of causal tracing in localizing knowledge across various text-to-image models, including SD-XL and DeepFloyd. They then introduce \crossprompt{}, a novel method to pinpoint control regions for visual attributes by intervening in the cross-attention layers of the UNet. Subsequently, they employ \crossedit{}, a closed-form editing method, to manipulate the identified locations and evaluate its effectiveness.
Result
The research demonstrates that \crossprompt{} successfully identifies unique locations controlling visual attributes across different text-to-image models. Moreover, \crossedit{} effectively implements edits at these locations for most models, except DeepFloyd, which exhibits limitations due to its bi-directional attention mechanism in the T5 text encoder. Notably, the study reveals that knowledge about specific styles can be localized to even a small subset of neurons, highlighting the potential for neuron-level model editing.
LF
The authors acknowledge limitations in applying closed-form edits to DeepFloyd and suggest exploring fast editing methods for models utilizing bi-directional attention as future work. Further research directions include investigating the generalizability of neuron-level editing beyond “style” to other attributes like “objects” and “facts”.
Abstract
Identifying layers within text-to-image models which control visual attributes can facilitate efficient model editing through closed-form updates. Recent work, leveraging causal tracing show that early Stable-Diffusion variants confine knowledge primarily to the first layer of the CLIP text-encoder, while it diffuses throughout the UNet.Extending this framework, we observe that for recent models (e.g., SD-XL, DeepFloyd), causal tracing fails in pinpointing localized knowledge, highlighting challenges in model editing. To address this issue, we introduce the concept of Mechanistic Localization in text-to-image models, where knowledge about various visual attributes (e.g., “style”, “objects”, “facts”) can be mechanistically localized to a small fraction of layers in the UNet, thus facilitating efficient model editing. We localize knowledge using our method LocoGen which measures the direct effect of intermediate layers to output generation by performing interventions in the cross-attention layers of the UNet. We then employ LocoEdit, a fast closed-form editing method across popular open-source text-to-image models (including the latest SD-XL)and explore the possibilities of neuron-level model editing. Using Mechanistic Localization, our work offers a better view of successes and failures in localization-based text-to-image model editing. Code will be available at https://github.com/samyadeepbasu/LocoGen.