MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

Authors: Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, Wangmeng Zuo

What

This paper introduces MasterWeaver, a novel method for personalized text-to-image generation that prioritizes both accurate identity representation and flexible image editing capabilities from a single reference image.

Why

This paper addresses the limitations of existing personalized text-to-image generation models, which often struggle to balance accurate identity preservation with flexible editing. MasterWeaver’s ability to achieve both makes it a valuable tool for various applications, including personalized content creation.

How

MasterWeaver leverages a pre-trained Stable Diffusion model and incorporates an identity mapping network to inject identity features into the image generation process. It introduces an editing direction loss to improve text controllability and utilizes a face-augmented dataset to disentangle identity features from attributes, enhancing editability.

Result

Experimental results demonstrate that MasterWeaver outperforms state-of-the-art methods in terms of identity fidelity, text alignment, and image quality. It produces high-quality personalized images with diverse attributes, clothing, backgrounds, and styles, even from a single reference image.

LF

The authors acknowledge limitations in generating images with multiple personalized identities and achieving precise control over fine-grained attributes. Future work will address these limitations and explore ethical considerations related to potential deepfake generation.

Abstract

Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information, resulting in unsatisfied text controllability, especially on faces. In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additional introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model. Additionally, a face-augmented dataset is constructed to facilitate disentangled identity learning, and further improve the editability. Extensive experiments demonstrate that our MasterWeaver can not only generate personalized images with faithful identity, but also exhibit superiority in text controllability. Our code will be publicly available at https://github.com/csyxwei/MasterWeaver.