Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023)
This is the official PyTorch implementation of SphereFormer (CVPR 2023).
Spherical Transformer for LiDAR-based 3D Recognition [Paper]
Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia
Highlight
- SphereFormer is a plug-and-play transformer module. We develop radial window attention, which significantly boosts the segmentation performance of distant points, e.g., from 13.3% to 30.4% mIoU on nuScenes lidarseg val set.
- It achieves superior performance on various outdoor semantic segmentation benchmarks, e.g., nuScenes, SemanticKITTI, Waymo, and also shows competitive results on nuScenes detection dataset.
- This repository employs a fast and memory-efficient library for sparse transformer with varying token numbers, SparseTransformer.
Get Started
For object deteciton, please go to the detection/
directory. (or click Here)
The below guide is for semantic segmentation.
Environment
Install dependencies (we test on python=3.7.9, pytorch==1.8.0, cuda==11.1, gcc==7.5.0)
git clone https://github.com/dvlab-research/SphereFormer.git --recursive
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch_scatter==2.0.9
pip install torch_geometric==1.7.2
pip install spconv-cu114==2.1.21
pip install torch_sparse==0.6.12 cumm-cu114==0.2.8 torch_cluster==1.5.9
pip install tensorboard timm termcolor tensorboardX
Install sptr
cd third_party/SparseTransformer && python setup.py install
Note: Make sure you have installed gcc
and cuda
, and nvcc
can work (if you install cuda by conda, it won't provide nvcc and you should install cuda manually.)
Datasets Preparation
nuScenes
Download the nuScenes dataset from here. Unzip and arrange it as follows. Then fill in the data_root
entry in the .yaml configuration file.
nuscenes/
|--- v1.0-trainval/
|--- samples/
|------- LIDAR_TOP/
|--- lidarseg/
|------- v1.0-trainval/
Then, fill in the data_path
and save_dir
in data/nuscenes_preprocess_infos.py
, then generate the infos by
pip install nuscenes-devkit pyquaternion
cd data && python nuscenes_preprocess_infos.py
SemanticKITTI
Download the SemanticKIITI dataset from here. Unzip and arrange it as follows. Then fill in the data_root
entry in the .yaml configuration file.
dataset/
|--- sequences/
|------- 00/
|------- 01/
|------- 02/
|------- 03/
|------- .../
Waymo Open Dataset
Download the Waymo Open Dataset from here. Unzip and arrange it as follows. Then fill in the data_root
entry in the .yaml configuration file.
waymo/
|--- training/
|--- validation/
|--- testing/
Then, transfer the raw files into the format of SemanticKITTI as follows. (Note: do not use GPU here, and CPU works well already)
cd data/waymo_to_semanticKITTI
CUDA_VISIBLE_DEVICES="" python convert.py --load_dir [YOUR_DATA_ROOT] --save_dir [YOUR_SAVE_ROOT]
Training
nuScenes
python train.py --config config/nuscenes/nuscenes_unet32_spherical_transformer.yaml
SemanticKITTI
python train.py --config config/semantic_kitti/semantic_kitti_unet32_spherical_transformer.yaml
Waymo Open Dataset
python train.py --config config/waymo/waymo_unet32_spherical_transformer.yaml
Validation
For validation, you need to modify the .yaml
config file. (1) fill in the weight
with the path of model weight (.pth
file); (2) set val
to True
; (3) for testing-time augmentation, set use_tta
to True
and set vote_num
accordingly. After that, run the following command.
python train.py --config [YOUR_CONFIG_PATH]
Pre-trained Models
dataset | Val mIoU (tta) | Val mIoU | mIoU_close | mIoU_medium | mIoU_distant | Download |
---|---|---|---|---|---|---|
nuScenes | 79.5 | 78.4 | 80.8 | 60.8 | 30.4 | Model Weight |
SemanticKITTI | 69.0 | 67.8 | 68.6 | 60.4 | 17.8 | Model Weight |
Waymo Open Dataset | 70.8 | 69.9 | 70.3 | 68.6 | 61.9 | N/A |
Note: Pre-trained weights on Waymo Open Dataset are not released due to the regulations.
SpTr Library
The SpTr
library is highly recommended for sparse transformer, particularly for 3D point cloud attention. It is fast, memory-efficient and easy-to-use. The github repository is https://github.com/dvlab-research/SparseTransformer.git.
Citation
If you find this project useful, please consider citing:
@inproceedings{lai2023spherical,
title={Spherical Transformer for LiDAR-based 3D Recognition},
author={Lai, Xin and Chen, Yukang and Lu, Fanbin and Liu, Jianhui and Jia, Jiaya},
booktitle={CVPR},
year={2023}
}
Our Works on 3D Point Cloud
-
Spherical Transformer for LiDAR-based 3D Recognition (CVPR 2023) [Paper] [Code] : A plug-and-play transformer module that boosts performance for distant region (for 3D LiDAR point cloud)
-
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022): [Paper] [Code] : Point-based window transformer for 3D point cloud segmentation
-
SparseTransformer (SpTr) Library [Code] : A fast, memory-efficient, and easy-to-use library for sparse transformer with varying token numbers.