Implicit Style-Content Separation using B-LoRA
Authors: Yarden Frenkel, Yael Vinker, Ariel Shamir, Daniel Cohen-Or
What
This paper presents B-LoRA, a novel method for implicit style-content separation in single images using Low-Rank Adaptation (LoRA) applied to specific transformer blocks in Stable Diffusion XL, enabling various image stylization tasks like style transfer, text-guided stylization, and consistent style generation.
Why
B-LoRA addresses the limitations of existing image stylization techniques, including overfitting issues associated with model fine-tuning and the need for separate models for style and content. By achieving style-content separation within a single image using a lightweight adapter, it offers flexibility, efficiency, and robust stylization capabilities.
How
The authors analyzed SDXL’s architecture to identify specific transformer blocks responsible for content and style. They then trained LoRA on these blocks (B-LoRAs) using a single input image and a general text prompt, resulting in an implicit style-content decomposition. The trained B-LoRAs can then be applied to various style manipulation tasks without additional training.
Result
B-LoRA effectively disentangles style and content, enabling high-quality image style transfer, text-guided style manipulation, and consistent style generation even for challenging inputs like stylized images and complex scenes. Extensive qualitative and quantitative evaluations, including a user study, demonstrate its superiority over alternative approaches.
LF
The authors acknowledge limitations such as color separation affecting identity preservation, potential style leakage from background elements in style images, and challenges with complex scenes. They suggest future work focusing on finer style-content sub-component separation and extending B-LoRA for multi-object and multi-style combinations.
Abstract
Image stylization involves manipulating the visual appearance and texture (style) of an image while preserving its underlying objects, structures, and concepts (content). The separation of style and content is essential for manipulating the image’s style independently from its content, ensuring a harmonious and visually pleasing result. Achieving this separation requires a deep understanding of both the visual and semantic characteristics of images, often necessitating the training of specialized models or employing heavy optimization. In this paper, we introduce B-LoRA, a method that leverages LoRA (Low-Rank Adaptation) to implicitly separate the style and content components of a single image, facilitating various image stylization tasks. By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks (referred to as B-LoRAs) achieves style-content separation that cannot be achieved by training each B-LoRA independently. Consolidating the training into only two blocks and separating style and content allows for significantly improving style manipulation and overcoming overfitting issues often associated with model fine-tuning. Once trained, the two B-LoRAs can be used as independent components to allow various image stylization tasks, including image style transfer, text-based image stylization, consistent style generation, and style-content mixing.