DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Authors: Kaiwen Zhang, Yifan Zhou, Xudong Xu, Xingang Pan, Bo Dai
What
This paper introduces DiffMorpher, a novel approach leveraging pre-trained diffusion models like Stable Diffusion to generate smooth and natural image morphing sequences.
Why
This paper is significant as it addresses a key limitation of diffusion models compared to GANs: their difficulty in smooth image interpolation, essential for realistic image morphing with various applications in animation, entertainment, and data augmentation.
How
DiffMorpher works by first fine-tuning two LoRAs to capture the semantics of two input images. Then, it interpolates between both the LoRA parameters and the latent noises obtained by DDIM inversion, ensuring smooth semantic and spatial transitions. It further incorporates attention interpolation and replacement for texture consistency, AdaIN adjustment for color coherence, and a new sampling schedule for uniform transition speed.
Result
DiffMorpher demonstrates superior performance over existing image morphing methods, evidenced by lower FID, PPL, and a newly proposed PDV metric on their MorphBench dataset. The approach produces high-quality, semantically consistent, and smooth image morphing sequences for diverse objects and styles, confirmed by both qualitative and quantitative evaluations, including a user study.
LF
Limitations include the need for LoRA training time for each image pair and reliance on text prompts. Future work could explore faster adaptation methods and incorporate correspondence information for challenging cases with unclear object alignment.
Abstract
Diffusion models have achieved remarkable image generation quality surpassing previous generative models. However, a notable limitation of diffusion models, in comparison to GANs, is their difficulty in smoothly interpolating between two image samples, due to their highly unstructured latent space. Such a smooth interpolation is intriguing as it naturally serves as a solution for the image morphing task with many applications. In this work, we present DiffMorpher, the first approach enabling smooth and natural image interpolation using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition, where correspondence automatically emerges without the need for annotation. In addition, we propose an attention interpolation and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images. Extensive experiments demonstrate that DiffMorpher achieves starkly better image morphing effects than previous methods across a variety of object categories, bridging a critical functional gap that distinguished diffusion models from GANs.