D2GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction


1Wuhan University; 2Shanghai Jiao Tong University; 3TongJi University;
4Bosch; 5Nanyang Technological University

*Equal Contribution Corresponding Author
Examples of LiDAR acquisition issues and D2GS reconstruction comparison

TL;DR: D2GS reconstructs dynamic urban scenes without LiDAR by turning multi-view image depth into dense, refined geometric supervision for Gaussian Splatting.

Abstract: Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, i.e. LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose D2GS, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. First, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. Second, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. Finally, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.

Methodology

Methodology Illustration

Overview: We first employ a Progressive Pruning strategy to obtain a robust global Gaussian initialization. A Road Node is incorporated into the scene graph structure to regularize the road region using strong geometric priors. During training, Gaussian optimization and depth refinement are performed iteratively, allowing depth to be learned jointly from Gaussian supervision and enhanced by diffusion priors from a pretrained depth foundation model.

Results

Depth / RGB Reconstruction

Comparison of image reconstruction and depth estimation performance

Visual comparison for RGB reconstruction and depth estimation. The highlighted regions show that D2GS can recover cleaner geometry without relying on LiDAR input.

Dynamic Visual Results

Scene-025 — Depth

Scene-025 — RGB

Depth Enhancement

Iterative visualization of rendered depths and depth-completion depths. The dense-depth enhancer provides stronger pseudo supervision as Gaussian optimization proceeds.

Novel View Synthesis

NVS examples highlight that reconstruction metrics alone are not sufficient; road-lane consistency and view synthesis quality also benefit from reliable depth regularization.

Related Links

Related methods and resources mentioned around this research direction:

BibTeX

@article{xia2025d,
  title={D $\^{} 2$ GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction},
  author={Xia, Kejing and Jia, Jidong and Jin, Ke and Bai, Yucai and Sun, Li and Tao, Dacheng and Zhang, Youjian},
  journal={arXiv preprint arXiv:2510.25173},
  year={2025}
}