PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction

1 Zhejiang University, 2 Shanghai Artificial Intelligence Laboratory, 3 Shanghai Jiao Tong University, 4 University of Science and Technology of China, 5 The Chinese University of Hong Kong, 6 The University of Hong Kong
* denotes equal contribution. † denotes corresponding authors.

TL;DR: PLANING introduces a loosely coupled triangle-Gaussian representation and a monocular streaming framework that jointly achieves accurate geometry, high-fidelity rendering, and efficient planar abstraction for embodied AI applications.

Our design yields a compact yet expressive scene representation, enabling photorealistic rendering and structurally coherent geometry for downstream simulation and robotics applications.



Method Overview

Pipeline of PLANING. PLANING adopts a hybrid representation in which triangles explicitly model scene geometry, while neural Gaussians decoded from these triangles render appearance. Built upon this representation, we develop a streaming reconstruction framework that takes unposed monocular image sequences as input and comprises a frontend for camera tracking, a backend for global pose optimization, and a mapper for scene reconstruction. Specifically, the mapper incorporates an efficient primitive initialization strategy to reduce redundancy. The recontructed triangle soup further enables efficient planar abstraction, facilitating a range of downstream tasks.

Structure Design

Triangle Representation. Compared to surfels, our representation produces clearer, opaque surfaces and enables finer rendering details.

Hybrid Structure. Our design effectively reduces redundancy and encourages Gaussians to concentrate around the underlying surface.



Applications


Efficient Locomotion Strategy Training. The reconstructed planar scenes serve as high-fidelity, simulation-ready environments that support stable and scalable reinforcement learning for humanoid walking and quadruped stair climbing.

Large-Scale Scene Reconstruction. Dynamic primitive loading allows efficient large-scale reconstruction beyond GPU memory limits.

Plane-Guided Camera Pose Optimization. By feeding reconstructed planar structures back into the frontend, we close the loop between mapping and tracking and significantly improve global consistency.

Reconstruction Results

Compact Geometry from Triangles


Ours (Triangles) 2DGS* (Mesh)

Representation of scene geometry.

Ours (Triangles) PlanarSplatting (Rectangles)

Representation of scene geometry.




Rendering Results

High-Fidelity Rendering from Gaussians


PLANING outperforms baselines, faithfully reconstructing fine structures and complex planar surfaces, such as wall-mounted mirrors and intricate ceiling ornaments, which existing baselines struggle to capture.