Skip to content

WYFDUT/SPAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection


Yifan Wang1,4, Yian Zhao2, Fanqi Pu1, Xiaochen Yang3, Yang Tang4,†, Xi Chen4, Wenming Yang1,†

1Tsinghua University    2Peking University    3University of Glasgow    4Tencent BAC   

Corresponding Authors

📧 yf-wang23@mails.tsinghua.edu.cn    yang.wenming@sz.tsinghua.edu.cn


📌 Introduction

Overview of Render-of-Thought

This repository hosts the official implementation of SPAN (Spatial-Projection Alignment), a novel framework for monocular 3D object detection that addresses the geometric consistency constraints overlooked in existing decoupled regression paradigms.

SPAN introduces a unified geometric consistency optimization paradigm that comprises two pivotal components:

  • Spatial Point Alignment: Enforces an explicit global spatial constraint between predicted and ground-truth 3D bounding boxes by aligning their eight corner coordinates in the camera coordinate system, thereby rectifying spatial drift caused by decoupled attribute regression.

  • 3D-2D Projection Alignment: Ensures that the projected 3D box is aligned tightly within its corresponding 2D detection bounding box on the image plane, mitigating projection misalignment overlooked in previous works.

To ensure training stability, we further introduce a Hierarchical Task Learning (HTL) strategy that progressively incorporates spatial-projection alignment as 3D attribute predictions refine, preventing early stage error propagation across attributes.

Key Features

  • 🎯 Spatial Point Alignment: Constrains 3D bounding box corners to align with ground-truth corners
  • 📐 3D-2D Projection Alignment: Ensures projected 3D boxes match their 2D detection boxes
  • 📈 Hierarchical Task Learning: Progressive training strategy for stable optimization
  • 🔌 Plug-and-Play: Can be easily integrated into any monocular 3D detector
  • Zero Inference Cost: No additional modules or computational overhead at inference time

Installation

  1. Clone the repository:

    git clone https://github.com/WYFDUT/SPAN.git
    cd SPAN
    
    conda create -n span python=3.8
    conda activate span
  2. Install dependencies:

    pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
    
    pip install -r requirements.txt
    
    cd lib/models/monodgp/ops/
    bash make.sh
    cd ../../../..
  3. Install OpenPCDet (if needed):

    cd OpenPCDet
    python setup.py develop
    cd ..

Data Preparation

Data Format

Download KITTI datasets and prepare the directory structure as:

│SPAN/
├──...
│data/kitti/
├──ImageSets/
├──training/
│   ├──image_2
│   ├──label_2
│   ├──calib
├──testing/
│   ├──image_2
│   ├──calib

Update the dataset path in configs/span.yaml:

dataset:
  root_dir: '/path/to/KITTI'

Training

Basic usage:

bash train.sh configs/span.yaml

With custom GPU:

CUDA_VISIBLE_DEVICES=0 bash train.sh configs/span.yaml

Checkpoints are saved to the path specified in trainer.save_path.

Test

The best checkpoint will be evaluated as default. You can change it at "tester/checkpoint" in configs/span.yaml:

bash test.sh configs/span.yaml

Results

The official results in the paper:

Models Val, AP3D|R40
Easy Mod. Hard
MonoDGP + (SPAN) 30.98% 23.26% 20.17%

This repo results on KITTI Val Split:

Models Val, AP3D|R40 Logs ckpt
Easy Mod. Hard
MonoDGP + (SPAN) 31.92% 23.32% 20.00% log ckpt
30.94% 23.34% 20.21% log -
31.81% 23.44% 20.29% log ckpt

The official results in the paper on KITTI Test Split:

Models Test, AP3D|R40 ckpt
Easy Mod. Hard
MonoDGP + (SPAN) 27.02% 19.30% 16.49% -

Test results submitted to the official KITTI Benchmark:

Car category:

All categories:

Citation

If you use this code in your research, please cite:

@misc{wang2025spanspatialprojectionalignmentmonocular,
      title={SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection}, 
      author={Yifan Wang and Yian Zhao and Fanqi Pu and Xiaochen Yang and Yang Tang and Xi Chen and Wenming Yang},
      year={2025},
      eprint={2511.06702},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.06702}, 
}

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

This repo benefits from the excellent work MonoDGP, OpenPCDet, MGIoU and related monocular 3D detection frameworks.

About

[CVPR 2026] SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors