We introduce MV-CoLight, a two-stage framework for illumination-consistent object compositing in both 2D images and 3D scenes. Our novel feed-forward architecture models lighting and shadows directly, avoiding the iterative biases of diffusion-based methods. We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly. To facilitate training and evaluation, we further introduce a large-scale 3D compositing dataset. Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset, as well as casually captured real-world scenes demonstrate the frameworkâs robustness and wide generalization.
Overview of MV-CoLight. (a) We insert a white puppy as the composite object onto the table between basketballs, and render multi-view inharmonious images, background-only images, and depth maps using a camera trajectory moving from distant to close-up positions. (b) We input a single-view data into the 2D object compositing model, which processes the data through multiple Swin Transformer blocks to output the harmonized result. (c) We project the multi-view features from 2D models into Gaussian space via \(\Phi(\cdot)\), combine them with the original inharmonious Gaussian colors projected into 2D Gaussian color space through \(\Psi(\cdot)\), and then feed them into the 3D object compositing model. The model outputs harmonized Gaussian colors and computes rendering loss by incorporating Gaussian shape attributes.
Visualization of the DTC-MultiLight dataset. We showcase rendered results of diverse scenes created using objects from the DTC dataset within the Blender engine, highlighting multi-view perspectives and varying lighting conditions.
Single-view qualitative comparisons. Compared to baselines, our method successfully generates coherence illumination and plausible shadows while decoupling highlights from inserted objects.
Multi-view qualitative comparisons. Our approach synthesizes multi-view consistent illumination and shadows while strictly preserving the original scene geometry, scale and object placement.
Real-world scene visualization. We evaluate our method on real-world scenes and achieve both color harmonization and realistic lighting/shadow generation.
Visual results of inserting a luminous object. Our method simulates the illumination effects of luminous spheres within the scene environment.
Multi-view visualization. Our method meticulously simulates the emission effects of inserted light sources, their illumination on surrounding objects, and shadows.