ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Authors: Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

What

This paper introduces ZipLoRA, a novel optimization-based method for merging independently trained style and content LoRAs (Low-Rank Adaptations) for text-to-image diffusion models. This allows for the generation of any user-provided subject in any user-provided style, enabling personalized and stylized image creation.

Why

This paper is important because it addresses a key limitation in existing text-to-image generation models: the ability to combine specific subjects with specific styles in a controllable and efficient manner. It achieves this by efficiently merging independently trained LoRAs, allowing for versatile and personalized image generation while preserving the subject’s identity and desired style.

How

The authors leverage two key insights: (1) sparsity of LoRA weight update matrices and (2) poor performance of directly merging highly aligned LoRA weights. They propose an optimization method that learns to merge style and content LoRAs by minimizing a loss function that encourages both style and subject fidelity while minimizing signal interference between the two LoRAs.

Result

ZipLoRA demonstrates superior performance compared to direct merging, joint training, and StyleDrop methods. It shows impressive results in generating stylized images while preserving subject fidelity and allows for control over the extent of stylization. The method also retains the ability to generate individual concepts (subject or style) accurately, demonstrating its versatility. User studies and quantitative metrics further highlight ZipLoRA’s effectiveness in achieving personalized stylizations.

LF

The authors do not explicitly mention limitations. However, potential areas for future work could include exploring: (1) extension of ZipLoRA to handle multiple styles or subjects, (2) exploring alternative optimization strategies or regularization techniques for more robust merging, and (3) investigating the application of ZipLoRA to other diffusion-based generative tasks beyond image stylization.

Abstract

Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io