SparseBEV

This is the official PyTorch implementation for paper SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos. (ICCV 2023)

中文解读：https://zhuanlan.zhihu.com/p/654821380

Model Zoo

Setting	Pretrain	Training Cost	NDS_val	NDS_test	FPS	Weights
r50_nuimg_704x256	nuImg	21h (8x2080Ti)	55.6	-	15.8	gdrive
r50_nuimg_704x256_400q_36ep	nuImg	28h (8x2080Ti)	55.8	-	23.5	gdrive
r101_nuimg_1408x512	nuImg	2d8h (8xV100)	59.2	-	6.5	gdrive
vov99_dd3d_1600x640_trainval_future	DD3D	4d1h (8xA100)	84.9	67.5	-	gdrive

We use r50_nuimg_704x256 for ablation studies and r50_nuimg_704x256_400q_36ep for comparison with others.
We recommend using r50_nuimg_704x256 to validate new ideas since it trains faster and the result is more stable.
FPS is measured with AMD 5800X CPU and RTX 3090 GPU (without fp16).
The noise is around 0.3 NDS.

Environment

Install PyTorch 2.0 + CUDA 11.8:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:

conda create -n sparsebev python=3.8
conda activate sparsebev
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch

Install other dependencies:

pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5

Install turbojpeg and pillow-simd to speed up data loading (optional but important):

sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1

Compile CUDA extensions:

cd models/csrc
python setup.py build_ext --inplace

Prepare Dataset

Download nuScenes from https://www.nuscenes.org/nuscenes and put it in data/nuscenes.
Download the generated info file from Google Drive and unzip it.
Folder structure:

data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval

These *.pkl files can also be generated with our script: gen_sweep_info.py.

Training

Download pretrained weights and put it in directory pretrain/:

pretrain
├── cascade_mask_rcnn_r101_fpn_1x_nuim_20201024_134804-45215b1e.pth
├── cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth

Train SparseBEV with 8 GPUs:

torchrun --nproc_per_node 8 train.py --config configs/r50_nuimg_704x256.py

Train SparseBEV with 4 GPUs (i.e the last four GPUs):

export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/r50_nuimg_704x256.py

The batch size for each GPU will be scaled automatically. So there is no need to modify the batch_size in config files.

Evaluation

Single-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Multi-GPU evaluation:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Timing

FPS is measured with a single GPU:

export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth

Acknowledgements

Many thanks to these excellent open-source projects:

3D Detection: DETR3D, PETR, BEVFormer, BEVDet, StreamPETR
2D Detection: AdaMixer, DN-DETR
Codebase: MMDetection3D, CamLiFlow

MCG-NJU/SparseBEV

MCG-NJU

Reviews

Repository Details