SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

Yifan Wang^1,4, Yian Zhao², Fanqi Pu¹, Xiaochen Yang³, Yang Tang^4,†, Xi Chen⁴, Wenming Yang^1,†

¹Tsinghua University ²Peking University ³University of Glasgow ⁴Tencent BAC

^†Corresponding Authors

📧 yf-wang23@mails.tsinghua.edu.cn yang.wenming@sz.tsinghua.edu.cn

📌 Introduction

This repository hosts the official implementation of SPAN (Spatial-Projection Alignment), a novel framework for monocular 3D object detection that addresses the geometric consistency constraints overlooked in existing decoupled regression paradigms.

SPAN introduces a unified geometric consistency optimization paradigm that comprises two pivotal components:

Spatial Point Alignment: Enforces an explicit global spatial constraint between predicted and ground-truth 3D bounding boxes by aligning their eight corner coordinates in the camera coordinate system, thereby rectifying spatial drift caused by decoupled attribute regression.
3D-2D Projection Alignment: Ensures that the projected 3D box is aligned tightly within its corresponding 2D detection bounding box on the image plane, mitigating projection misalignment overlooked in previous works.

To ensure training stability, we further introduce a Hierarchical Task Learning (HTL) strategy that progressively incorporates spatial-projection alignment as 3D attribute predictions refine, preventing early stage error propagation across attributes.

Key Features

🎯 Spatial Point Alignment: Constrains 3D bounding box corners to align with ground-truth corners
📐 3D-2D Projection Alignment: Ensures projected 3D boxes match their 2D detection boxes
📈 Hierarchical Task Learning: Progressive training strategy for stable optimization
🔌 Plug-and-Play: Can be easily integrated into any monocular 3D detector
⚡ Zero Inference Cost: No additional modules or computational overhead at inference time

Installation

Clone the repository:

git clone https://github.com/WYFDUT/SPAN.git
cd SPAN

conda create -n span python=3.8
conda activate span

Install dependencies:

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt

cd lib/models/monodgp/ops/
bash make.sh
cd ../../../..

Install OpenPCDet (if needed):

cd OpenPCDet
python setup.py develop
cd ..

Data Preparation

Data Format

Download KITTI datasets and prepare the directory structure as:

│SPAN/
├──...
│data/kitti/
├──ImageSets/
├──training/
│   ├──image_2
│   ├──label_2
│   ├──calib
├──testing/
│   ├──image_2
│   ├──calib

Update the dataset path in configs/span.yaml:

dataset:
  root_dir: '/path/to/KITTI'

Training

Basic usage:

bash train.sh configs/span.yaml

With custom GPU:

CUDA_VISIBLE_DEVICES=0 bash train.sh configs/span.yaml

Checkpoints are saved to the path specified in trainer.save_path.

Test

The best checkpoint will be evaluated as default. You can change it at "tester/checkpoint" in configs/span.yaml:

bash test.sh configs/span.yaml

Results

The official results in the paper:

Models	Val, AP_3D\|R40
	Easy	Mod.	Hard
MonoDGP + (SPAN)	30.98%	23.26%	20.17%

This repo results on KITTI Val Split:

Models	Val, AP_3D\|R40			Logs	ckpt
Models	Easy	Mod.	Hard	Logs	ckpt
MonoDGP + (SPAN)	31.92%	23.32%	20.00%	log	ckpt
	30.94%	23.34%	20.21%	log	-
	31.81%	23.44%	20.29%	log	ckpt

The official results in the paper on KITTI Test Split:

Models	Test, AP_3D\|R40			ckpt
	Easy	Mod.	Hard
MonoDGP + (SPAN)	27.02%	19.30%	16.49%	-

Test results submitted to the official KITTI Benchmark:

Car category:

All categories:

Citation

If you use this code in your research, please cite:

@misc{wang2025spanspatialprojectionalignmentmonocular,
      title={SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection}, 
      author={Yifan Wang and Yian Zhao and Fanqi Pu and Xiaochen Yang and Yang Tang and Xi Chen and Wenming Yang},
      year={2025},
      eprint={2511.06702},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.06702}, 
}

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

This repo benefits from the excellent work MonoDGP, OpenPCDet, MGIoU and related monocular 3D detection frameworks.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
OpenPCDet		OpenPCDet
configs		configs
docs		docs
figures		figures
lib		lib
tools		tools
utils		utils
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt
test.sh		test.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

📌 Introduction

Key Features

Installation

Data Preparation

Data Format

Training

Test

Results

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection

📌 Introduction

Key Features

Installation

Data Preparation

Data Format

Training

Test

Results

Citation

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages