DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Authors: Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet, Amit Raj, Varun Jampani, Pramook Khungurn, Supasorn Suwajanakorn
What
This paper introduces a novel technique, DiffusionLight, for estimating high dynamic range (HDR) lighting from a single image by leveraging pre-trained text-to-image diffusion models to inpaint a chrome ball into the scene and subsequently unwrapping its reflection to obtain an environment map.
Why
The paper addresses the limitations of current lighting estimation methods that rely on limited HDR panorama datasets, resulting in poor generalization to real-world, uncontrolled settings. By harnessing the vast image prior of diffusion models trained on billions of standard images, DiffusionLight demonstrates superior generalization and handles diverse in-the-wild scenarios effectively.
How
The authors utilize a depth-conditioned Stable Diffusion XL model to inpaint chrome balls, addressing the challenge of generating high-quality reflections. They introduce an iterative inpainting algorithm to locate suitable initial noise maps for consistent ball generation. For HDR prediction, they fine-tune the model with LoRA to perform exposure bracketing, generating multiple LDR chrome balls at varying exposures which are then merged to produce a linearized HDR output.
Result
DiffusionLight achieves competitive results on standard benchmarks (Laval Indoor and Poly Haven), outperforming StyleLight in terms of Angular Error and Normalized RMSE. Notably, it exhibits strong generalization to in-the-wild images where existing methods struggle. The ablation study confirms the contribution of both the iterative inpainting algorithm and LoRA fine-tuning for improved performance.
LF
The paper acknowledges limitations such as the assumption of orthographic projection due to unknown camera parameters, occasional failure to reflect environments in overhead images, and the current slow processing time due to diffusion sampling. Future work includes addressing perspective projection, handling overhead views, and exploring faster sampling-efficient diffusion models.
Abstract
We present a simple yet effective technique to estimate lighting in a single input image. Current techniques rely heavily on HDR panorama datasets to train neural networks to regress an input with limited field-of-view to a full environment map. However, these approaches often struggle with real-world, uncontrolled settings due to the limited diversity and size of their datasets. To address this problem, we leverage diffusion models trained on billions of standard images to render a chrome ball into the input image. Despite its simplicity, this task remains challenging: the diffusion models often insert incorrect or inconsistent objects and cannot readily generate images in HDR format. Our research uncovers a surprising relationship between the appearance of chrome balls and the initial diffusion noise map, which we utilize to consistently generate high-quality chrome balls. We further fine-tune an LDR diffusion model (Stable Diffusion XL) with LoRA, enabling it to perform exposure bracketing for HDR light estimation. Our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios.