Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Authors: Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

What

This paper introduces DirectInversion, a novel technique for inverting diffusion models in text-based image editing, which disentangles the source and target diffusion branches to excel in content preservation and edit fidelity, respectively.

Why

The paper addresses limitations in existing diffusion model inversion techniques used for text-based image editing, which often rely on computationally expensive optimization and may compromise either content preservation or edit fidelity. The authors argue for a disentangled approach to optimize both aspects, and introduce a new benchmark dataset for evaluation.

How

The authors propose DirectInversion, which directly rectifies the deviation path in the source branch using a simple three-line code modification to DDIM inversion. They introduce PIE-Bench, a new benchmark dataset with 700 images and diverse editing categories, to evaluate their method across 8 different editing techniques and against existing inversion methods using 7 evaluation metrics.

Result

DirectInversion demonstrates superior performance compared to existing optimization-based inversion methods, achieving significant improvements in essential content preservation (up to 83.2% enhancement in Structure Distance) and edit fidelity (up to 8.8% improvement in Edit Region Clip Similarity), while being significantly faster. The method also improves content preservation by up to 20.2% and edit fidelity by up to 2.5% when integrated with other editing techniques.

LF

The authors acknowledge limitations inherited from existing diffusion-based editing methods, such as instability and low success rates in certain complex editing scenarios. Future work includes extending the approach to video editing, developing more robust editing models, and creating more comprehensive evaluation metrics.

Abstract

Text-guided diffusion models have revolutionized image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model. This vector is subsequently fed into separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt. Prior inversion techniques aimed at finding a unified solution in both the source and target diffusion branches. However, our theoretical and empirical analyses reveal that disentangling these branches leads to a distinct separation of responsibilities for preserving essential content and ensuring edit fidelity. Building on this insight, we introduce “Direct Inversion,” a novel technique achieving optimal performance of both branches with just three lines of code. To assess image editing performance, we present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types, accompanied by versatile annotations and comprehensive evaluation metrics. Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

🪴 Quartz 4.0

Explorer

Direct Inversion Boosting Diffusion-based Editing with 3 Lines of Code

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

What

Why

How

Result

LF

Abstract

Graph View

Table of Contents

Backlinks