ReNoise: Real Image Inversion Through Iterative Noising
Authors: Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Daniel Cohen-Or
What
This paper proposes ReNoise, a new diffusion model inversion method that enhances reconstruction accuracy and editability, especially for recent few-step models, without increasing computational cost.
Why
This research is important because it addresses the limitations of existing inversion methods for real image editing with diffusion models, particularly in the context of few-step models which are essential for interactive editing workflows.
How
The authors developed ReNoise, a technique based on fixed-point iteration that refines the approximation of points along the forward diffusion trajectory during the inversion process. This is achieved by iteratively renoising the latent representation using the pre-trained diffusion model and averaging the resulting predictions. They also introduce techniques to enhance editability and correct noise in non-deterministic samplers.
Result
ReNoise demonstrates superior reconstruction quality compared to existing sampler reversing methods, including DDIM inversion, for a fixed number of UNet operations. It also shows improved editability, enabling successful text-driven manipulations on real images, even with few-step models like SDXL Turbo and LCM LoRA. ReNoise is numerically stable, converges consistently, and outperforms other null-prompt inversion methods in terms of speed and accuracy.
LF
The authors acknowledge the limitation of model-specific hyperparameter tuning for edit enhancement and noise correction in ReNoise. Future work includes more extensive testing with advanced editing methods and adapting ReNoise to video diffusion models.
Abstract
Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities. However, applying these methods to real images necessitates the inversion of the images into the domain of the pretrained diffusion model. Achieving faithful inversion remains a challenge, particularly for more recent models trained to generate images with a small number of denoising steps. In this work, we introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. Building on reversing the diffusion sampling process, our method employs an iterative renoising mechanism at each inversion sampling step. This mechanism refines the approximation of a predicted point along the forward diffusion trajectory, by iteratively applying the pretrained diffusion model, and averaging these predictions. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models. Through comprehensive evaluations and comparisons, we show its effectiveness in terms of both accuracy and speed. Furthermore, we confirm that our method preserves editability by demonstrating text-driven image editing on real images.