Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

1 University of Science and Technology of China , 2 Shanghai Jiao Tong University, 3 Shanghai Artificial Intelligence Laboratory , 4 The Chinese University of Hong Kong ,
5 Brown University , 6 The University of Hong Kong

TL;DR: We introduce Horizon-GS, tackles the unified reconstruction and rendering for aerial and street views with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes.

Our method delivers an immersive and seamless experience for city roaming while achieving high-quality rendering and reconstruction of aerial-to-ground scenes.

The colored camera trajectories depict novel viewpoints, with the reconstructed mesh overlaid on the scene, while the surrounding images display the predicted views for each trajectory.



Abstract

Seamless integration of both aerial and street view images remains a significant challenge in neural scene reconstruction and rendering. Existing methods predominantly focus on single domain, limiting their applications in immersive environments, which demand extensive free view exploration with large view changes both horizontally and vertically. We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, tackles the unified reconstruction and rendering for aerial and street views. Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes. We also curate a high-quality aerial-to-ground views dataset encompassing both synthetic and real-world scene to advance further research. Experiments across diverse urban scene datasets confirm the effectiveness of our method.



Method Overview

Illustration of our proposed Horizon-GS: We divide large-scale scenes into chunks. For each chunk, we initialize LOD-structured anchors and conduct the coarse-to-fine training process. Specifically, the coarse stage reconstructs the overall scene, while the fine stage enhances street view details (highlighted in purple). We can derive RGB, depth, and normal images by utilizing different primitive attributes (2D/3D Gaussians) with a single shared underlying structure.



Data Capture

Visualization of our constructed dataset: All the 7 scenes contain calibrated aerial and street view images. We illustrate the scenes with the point clouds and the corresponding image capture poses. The trajectory of aerial views is shown in purple, while street views are represented in yellow. Our dataset contains 5 synthetic scenes (a-e) and 2 real scenes (f-g).

Results

Rendering Performance

Compared to baselines, Horizon-GS successfully captures fine details in the scene, particularly for objects with thin structures such as trees, decorative texts, etc, from delicate scenes (a) to large-scale scenes (b).


Surface Performance

Thanks to the two-stage training approach, Horizon-GS can delivers geometrically accurate, and artifact-free reconstruction. In contrast, 2D-GS introduces artifacts, resulting in incomplete and lackluster geometry.