Abstract
DA², a zero-shot generalizable and fully end-to-end panoramic depth estimator, addresses challenges in panoramic depth estimation by using a data curation engine and SphereViT to handle spherical distortions, achieving state-of-the-art performance.
Panorama has a full FoV (360^circtimes180^circ), offering a more complete visual description than perspective images. Thanks to this characteristic, panoramic depth estimation is gaining increasing traction in 3D vision. However, due to the scarcity of panoramic data, previous methods are often restricted to in-domain settings, leading to poor zero-shot generalization. Furthermore, due to the spherical distortions inherent in panoramas, many approaches rely on perspective splitting (e.g., cubemaps), which leads to suboptimal efficiency. To address these challenges, we propose DA^{2}: Depth Anything in Any Direction, an accurate, zero-shot generalizable, and fully end-to-end panoramic depth estimator. Specifically, for scaling up panoramic data, we introduce a data curation engine for generating high-quality panoramic depth data from perspective, and create sim543K panoramic RGB-depth pairs, bringing the total to sim607K. To further mitigate the spherical distortions, we present SphereViT, which explicitly leverages spherical coordinates to enforce the spherical geometric consistency in panoramic image features, yielding improved performance. A comprehensive benchmark on multiple datasets clearly demonstrates DA^{2}'s SoTA performance, with an average 38% improvement on AbsRel over the strongest zero-shot baseline. Surprisingly, DA^{2} even outperforms prior in-domain methods, highlighting its superior zero-shot generalization. Moreover, as an end-to-end solution, DA^{2} exhibits much higher efficiency over fusion-based approaches. Both the code and the curated panoramic data will be released. Project page: https://depth-any-in-any-dir.github.io/.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention (2025)
- CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion (2025)
- BRIDGE -- Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation (2025)
- Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds (2025)
- RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization (2025)
- G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration (2025)
- PanoWorld-X: Generating Explorable Panoramic Worlds via Sphere-Aware Video Diffusion (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend