OmniCity: Omnipotent City Understanding with
Multi-level and Multi-view Images

Weijia Li¹ Yawen Lai² Linning Xu³ Yuanbo Xiangli³ Jinhua Yu¹ Conghui He^2,4 Gui-Song Xia⁵ Dahua Lin^3,4

Sun Yat-sen University¹ Sensetime Research² The Chinese University of Hong Kong³
Shanghai AI Laboratory⁴ Wuhan University ⁵

Abstract

This paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images. More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City. To alleviate the substantial pixel-wise annotation efforts, we propose an efficient street-view image annotation pipeline that leverages the existing label maps of satellite view and the transformation relations between different views (satellite, panorama, and mono-view). With the new OmniCity dataset, we provide benchmarks for a variety of tasks including building footprint extraction, height estimation, and building plane/instance/fine-grained segmentation. Compared with the existing multi-level and multi-view benchmarks, our OmniCity contains a larger number of images with richer annotation types and more views, provides more benchmark results obtained from state-of-the-art models, and introduces a novel task for fine-grained building instance segmentation on street-level panorama images. Moreover, OmniCity provides new problem settings for existing tasks, such as cross-view image matching, synthesis, segmentation, detection, etc., and facilitates the developing of new methods for large-scale city understanding, reconstruction, and simulation.

Comparison with Current Benchmarks

A comparison of our proposed dataset and existing city-related datasets. The # Images column represents the number of annotated images. The street view column shows whether the dataset contains no / mono-view (mono) / panorama (pano) street-level images. The satellite view column shows whether the dataset contains no / single / multiple satellite images. The annotation level column indicates which level of tasks the dataset is designed for, i.e., semantic segmentation, object detection (bbox), instance segmentation, plane segmentation, and image classification. The last two columns indicate whether the dataset contains fine-grained land use or height labels. Compared with the existing benchmarks, our OmniCity contains a larger number of images, more types of views, and richer annotation types at a finer annotation level.

Annotation Tool

The annotator is first required to drag the floor line to fit the bottom boundary of all buildings. Next, the annotator needs to add the split line and adjust the top line to fit the roof boundary for each building plane. In the bottom-right sub-window, we provide auxiliary information indicating the approximate locations of the split lines, which is generated by transforming the building footprint split lines in the satellite view to panorama view using a geo-transformation method. The annotators should consider both auxiliary information and building appearance (e.g. texture discrepancy, doors, etc.) to decide the accurate location of each split line. During the attribute assignment stage, the annotator needs to add the attributes (instance ID, block-lot id and land use type) for each building plane labeled in the previous stage, which are demonstrated in the bottom-middle sub-window. The building planes that belong to the same building instance will be set as the same instance ID (in the crossroads scene); Otherwise, the plane will be set as a specific instance ID successively. When a building instance is selected by the annotator (the yellow one in the panorama image), the surrounding auxiliary lines of its corresponding footprint in the bottom-right sub-window will turn red. Then the annotators assign the lot-block id and the land use type according to the numbers shown in the bottom-right sub-window, which can be switched between the land use mode and the block-lot mode.

Example Results

In this work, we provide a variety of benchmarks for multiple satellite and street-level tasks. The satellite-level tasks in our experiments include building footprint segmentation and height estimation. For both tasks, we conduct experiments on the satellite images with three view angles. For the street-level tasks, we conduct two instance segmentation tasks (i.e., land use and building instance segmentation) on the panorama images, and three instance segmentation tasks (i.e., land use / building instance / plane segmentation) on mono-view images. Please note that these are only preliminary experimental results on OmniCity dataset. More benchmarks of latest models and additional tasks will be continuously updated on OmniCity homepage.

BibTeX

@article{li2022omnicity,
    title={OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images},
    author={Li, Weijia and Lai, Yawen and Xu, Linning and Xiangli, Yuanbo and Yu, Jinhua and He, Conghui and Xia, Gui-Song and Lin, Dahua},
    journal={arXiv e-prints},
    pages={arXiv--2208},
    year={2022}
    }

OmniCity: Omnipotent City Understanding with
Multi-level and Multi-view Images

Arxiv

CVPR2023

Data

Code

Abstract

Comparison with Current Benchmarks

Annotation Tool

Example Results

BibTeX