Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

CVPR 2024, Highlight
1 Shanghai Artificial Intelligence Laboratory, 2 The Chinese University of Hong Kong, 3 Nanjing University 4 Cornell University

TL;DR: We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum.

Our method converges faster, uses fewer primitives, and achieves better visual quality.

Our method performs superior on scenes with challenging observing views. e.g. transparency, specularity, reflection, texture-less regions and fine-scale details.

Method Overview

Framework. (a) We start by forming a sparse voxel grid from SfM-derived points. An anchor associated with a learnable scale is placed at the center of each voxel, roughly sculpturing the scene occupancy. (b) Within a view frustum, k neural Gaussians are spawned from each visible anchor with offsets. Their attributes, i.e. opacity, color, scale and quaternion are then decoded from the anchor feature, relative camera-anchor viewing direction and distance using MLPs. (c) Note that to alleviate redundancy and improve efficiency, only non-trivial neural Gussians are rasterized. The rendered image is supervised via reconstruction, structural similarity, and a volume regularization.

Anchor refinement. We propose an error-based anchor growing policy to reliably grow new anchors where neural Gaussians find significant. We quantize neural Gaussians into multi-resolution voxels and add new anchors to voxels with gradients larger than level-wise thresholds. Our strategy effectively improves scene coverage without using excessive points.


Scaffold-GS rendering results on various types of scenes. Challenging cases including texture-less area, insufficient observations, fine-scale details, view-dependent light effects and multi-scale observations are reasonably handeled.

Scaffold-GS is more robust to view-dependent effects (e.g. reflection, shadowing); and alleviates the artifacts (e.g. floaters, structure error) caused by redundant 3D Gaussians.

Analysis on anchor features. The clustered anchor features exhibit clues of scene contents, showing that our approach improves the interpretability of 3D-GS model, and has the potential to be scaledup on much larger scenes exploiting reusable features. More findings can be found in our paper.