• Stars
    star
    161
  • Rank 233,470 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[AAAI 2023] PolarFormer: Multi-camera 3D Object Detection with Polar Transformers

PolarFormer

Paper

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers,
Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang
AAAI 2023

This repository is an official implementation of PolarFormer.


Abstract

3D object detection in autonomous driving aims to reason โ€œwhatโ€ and โ€œwhereโ€ the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing 3D object detection methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego carโ€™s perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the birdโ€™s-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross-attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polarโ€™s distance dimension, we further introduce a multiscale Polar representation learning strategy As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives, as well as yielding competitive performance on BEV semantic segmentation task.

News

  • (2022.11.25): Detection code of PolarFormer is released.
  • (2022.7.1): The paper of PolarFomer is released on arxiv.
  • (2022.5.18): PolarFormer achieves state-of-the-art performance among the published works (57.2% NDS and 49.3% mAP) on nuScenes 3D object detection leaderboard.
  • (2022.5.16): PolarFormer-pure achieves state-of-the-art performance among the published works (54.3% NDS and 45.7% mAP) on nuScenes 3D object detection (without external data) leaderboard.

Get Started

Environment

This implementation is build upon detr3d, please follow the steps in install.md to prepare the environment.

Data

Please follow the official instructions of mmdetection3d to process the nuScenes dataset.(https://mmdetection3d.readthedocs.io/en/v0.17.3/datasets/nuscenes_det.html)

After preparation, you will be able to see the following directory structure:

PolarFormer
โ”œโ”€โ”€ mmdetection3d
โ”œโ”€โ”€ projects
โ”‚   โ”œโ”€โ”€ configs
โ”‚   โ”œโ”€โ”€ mmdet3d_plugin
โ”œโ”€โ”€ tools
โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ nuscenes
โ”œโ”€โ”€ ckpts
โ”œโ”€โ”€ README.md

Train & inference

cd PolarFormer

You can train the model following:

tools/dist_train.sh projects/configs/polarformer/polarformer_r101.py.py 8 --work-dir work_dirs/polarformer_r101/

You can evaluate the model following:

tools/dist_test.sh projects/configs/polarformer/polarformer_r101.py work_dirs/polarformer_r101/latest.pth 8 --eval bbox

Main Results

3D Object Detection on nuScenes test set:

model mAP NDS
PolarFormer, R101_DCN 41.5 47.0
PolarFormer-T, R101_DCN 45.7 54.3
PolarFormer, V2-99 45.5 50.3
PolarFormer-T, V2-99 49.3 57.2

3D Object Detection on nuScenes validation set:

model mAP NDS config download
PolarFormer, R101_DCN 39.6 45.8 config ckpt
PolarFormer-w/o_bev_aug, R101_DCN 39.2 46.0 config ckpt / log
PolarFormer-T, R101_DCN 43.2 52.8 - -
PolarFormer, V2-99 50.0 56.2 config ckpt

Note: We adopt BEV data augmentation(random flipping, scaling and rotation) as the default setting when developing PolarFormer on nuScenes dataset. However, as the ablation in 2nd row indicates, BEV augmentation contributes little to the overall performance of PolarFormer. So please feel free to set "use_bev_aug = False" during training if you want to reduce computational burden.

BEV Segmentation on nuScenes validation set:

model Drivable Crossing Walking Carpark Divider
PolarFormer, efficientnet-b0 81.0 48.9 55.8 52.6 42.2
PolarFormer-T, efficientnet-b0 82.6 54.3 59.4 56.7 46.2
PolarFormer-joint_det_seg, R101_DCN 82.6 50.1 57.4 54.1 44.5

Visualization


Reference

@inproceedings{jiang2022polar,
  title={PolarFormer: Multi-camera 3D Object Detection with Polar Transformers},
  author={Jiang, Yanqin and Zhang, Li and Miao, Zhenwei and Zhu, Xiatian and Gao, Jin and Hu, Weiming and Jiang, Yu-Gang},
  booktitle={AAAI},
  year={2023}
}

Acknowledgement

Many thanks to the following open-source projects:

More Repositories

1

Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
Python
2,073
star
2

SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Python
1,033
star
3

4d-gaussian-splatting

[ICLR 2024] Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Python
538
star
4

SOFT

[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity
Python
302
star
5

SeaFormer

[ICLR 2023] SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
Python
285
star
6

PVG

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering
Python
257
star
7

DeepInteraction

[NeurIPS 2022] DeepInteraction: 3D Object Detection via Modality Interaction
Python
201
star
8

GSS

[CVPR 2023] Official repository of Generative Semantic Segmentation
Python
196
star
9

S-NeRF

[ICLR 2023] S-NeRF: Neural Radiance Fields for Street Views
Python
165
star
10

tet-splatting

[NeurIPS 2024] Tetrahedron Splatting for 3D Generation
107
star
11

Ego3RT

[ECCV 2022] Learning Ego 3D Representation as Ray Tracing
Python
105
star
12

WoVoGen

WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
Python
78
star
13

Efficient4D

Python
74
star
14

PGC-3D

[ICLR 2024] Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Python
73
star
15

meta-prompts

Python
67
star
16

Reason2Drive

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
64
star
17

RoadNet

[ICCV2023 Oral] RoadNetworkTRansformer & [AAAI 2024] LaneGraph2Seq
Python
63
star
18

NeRF-LiDAR

[AAAI 2024] NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Python
62
star
19

PDS

[ECCV 2022] Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling
Python
54
star
20

DGMN2

[TPAMI 2022 & CVPR 2020 Oral] Dynamic Graph Message Passing Networks
Python
29
star
21

diffusion-square

Python
29
star
22

DDMP

[CVPR 2021] Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
Python
24
star
23

TDAS

18
star
24

S-Agents

Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
16
star
25

Rodyn-SLAM

15
star
26

PARTNER

[ICCV 2023] PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
Python
11
star
27

fudan-zvg.github.io

JavaScript
4
star
28

Brain3D

2
star
29

DGMN2_MindSpore_Ascend

Python
1
star