PractiLight: Practical Light Control Using Foundational Diffusion Models

1Tel Aviv University 2MPI for Informatics

Short video showing our work.

Abstract

Light control in generated images is a difficult task, posing specific challenges, spanning over the entire image and frequency spectrum. Most approaches tackle this problem by training on extensive yet domain-specific datasets, limiting the inherent generalization and applicability of the foundational backbones used. Instead, PractiLight is a practical approach, effectively leveraging foundational understanding of recent generative models for the task. Our key insight is that lighting relationships in an image are similar in nature to token interaction in self-attention layers, and hence are best represented there. Based on this and other analyses regarding the importance of early diffusion iterations, PractiLight trains a lightweight LoRA regressor to produce the direct light map for a given image, using a small set of training images. We then employ this regressor to incorporate the desired lighting into the generation process of another image using Classifier Guidance. This careful design generalizes well to diverse conditions and image domains. We demonstrate state-of-the-art performance in terms of quality and control with proven parameter and data efficiency compared to leading works over a wide variety of scene types. We hope this work affirms that image lighting can feasibly be controlled by tapping into foundational knowledge, enabling practical and general relighting.

example results

In a nutshell

Our main observation is that large diffusion models understand light transport pretty well, there is no need to finetune them over millions of images to achieve plausible relighting, which degrades generalization. To tap into this prior, we just need to carefully consider where (layers) and when (timesteps) to add guidance. This allows us to train a tiny regressor on a small-scale synthetic dataset to extract the direct-irradiance map, and use it to guide the generation process into relighting images with pretty dramatic effects, while perserving the identity and style of the original image. Our method allows for relighting results on a wide range of image domains, with very little additional compute and no specialized or large-scale data.
Original
Control
Relit


Original
Control
Relit

Light Transport Analysis

To investigate which layers encode light transport phenomena, we conducted a feature injection experiment: activations from the generation of relit images were injected into the generation process of the original image (from the same scene). We found that light transport effects are predominantly encoded in the self-attention layers, particularly in the decoder of a UNet-based diffusion model (e.g., SD 1.5). This is not surprising — we speculate that the many-to-many interactions in self-attention resemble light transport interactions, making these layers a natural locus for encoding such effects. In this sense, self-attention may act as an inductive bias for modeling light transport.
analysis

Validation

We validated our approach by creating a custom prompt-image dataset containing images from diverse image domains. We compared our results to other state-of-the-art methods and measured the aesthetics, control and identity adhearance, and efficiency. We found our method to be extremellly competitive in all metrics. We further conducted a user study which shows our results are preffered over the other methods. We attribute the generelization and quality of results to the small-scale training of our regressor.
example results
example results

BibTeX