• Stars
    star
    126
  • Rank 282,903 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repo presents you the official code of "VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention"

VISTA

arc

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Shengheng Deng, Zhihao Liang, Lin Sun and Kui Jia*

(*) Corresponding author

Introduction

Detecting objects from LiDAR point clouds is of tremendous significance in autonomous driving. In spite of good progress, accurate and reliable 3D detection is yet to be achieved due to the sparsity and irregularity of LiDAR point clouds. Among existing strategies, multi-view methods have shown great promise by leveraging the more comprehensive information from both bird's eye view (BEV) and range view (RV). These multi-view methods either refine the proposals predicted from single view via fused features, or fuse the features without considering the global spatial context; their performance is limited consequently. In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one. Thanks to the learned attention mechanism, VISTA can produce fused features of high quality for prediction of proposals. We decouple the classification and regression tasks in VISTA, and an additional constraint of attention variance is applied that enables the attention module to focus on specific targets instead of generic points. [arxiv]

Requirements

  • Linux
  • Python 3.7+ (Tested on 3.7)
  • PyTorch 1.8 or higher (Tested on 1.8.1)
  • CUDA 11.1 or higher (Tested on 11.1)
  • spconv 2.0+

Notes

  • Spconv should be the exact same version we provide in the instruction down below

  • Nuscenes-Devkit should be the exact same version we provide in the instruction down below

Installation

Make sure your gpu driver and system environment support the pytorch version

conda create --name vista python=3.7
conda activate vista
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

git clone https://github.com/Gorilla-Lab-SCUT/VISTA.git

pip install -r requirements.txt

python setup.py build develop

Spconv

Please refer to spconv for detailed installation instructions

In our cases, we follow the command down below to install the latest spconv 2.0 which is faster and lighter than spconv 1.0, and is easier to install

pip install spconv-cu111

NOTE You need to install the spconv according to your current CUDA version!

Nuscenes-Devkit

git clone https://github.com/AndlollipopDE/nuscenes.git
cd nuscenes
pip install -r requirements.txt
python setup.py install

Data Preparation

Download the nuscenes data and organise as follows

NUSCENES_TRAINVAL_DATASET_ROOT
       โ”œโ”€โ”€ samples       <-- key frames
       โ”œโ”€โ”€ sweeps        <-- frames without annotation
       โ”œโ”€โ”€ maps          <-- unused
       โ””โ”€โ”€ v1.0-trainval <-- metadata and annotations
NUSCENES_TEST_DATASET_ROOT
       โ”œโ”€โ”€ samples       <-- key frames
       โ”œโ”€โ”€ sweeps        <-- frames without annotation
       โ”œโ”€โ”€ maps          <-- unused
       โ””โ”€โ”€ v1.0-test     <-- metadata

Then run the following command to create data pkl for trainval set

python tools/create_data.py nuscenes_data_prep --root_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --nsweeps=10

If you want to create data pkl for test set:

python tools/create_data.py nuscenes_data_prep_test --root_path=NUSCENES_TEST_DATASET_ROOT --nsweeps=10

Training

We provide the configurations. Please modify the data path and batch size accordingly

To train the VISTA, please run the following command, note that you should modify the workdir path and CUDA GPU Number in the script

./tools/scripts/train.sh experiment_description configuration_path

To resume a training, run

./tools/scripts/train.sh experiment_description configuration_path resume_checkpoint_path

Evaluation and Testing

To evaluate the VISTA on the validation set, simply run

./tools/scripts/test.sh configuration_path work_dir workdir/checkpoint.pth

To test the VISTA on the test set, please enable the test flag in test.sh and replace the testing pkl path in dist_test.py

Pretrained model

We provide a pretrained model trained on the nuScenes dataset, the configuration is exactly the one we provide. The pretrained model can be downloaded from Google Drive. The performances of the pretrained model on validation set of nuScenes are presented down below (Double Flip Enabled).

mAP NDS Car AP Truck AP Bus AP Trailer AP
62.83 69.52 85.93 60.73 68.40 41.42
Cons Vehicle AP Pedestrian AP Motorcycle AP Bicycle AP Traffic Cone AP Barrier AP
23.50 85.40 70.20 55.53 71.47 65.84

Acknowlegement

This repo is built upon several opensourced codebases, shout out to them for their amazing works.

Citation

If you find this work useful in your research, please cite

@inproceedings{deng2022vista,
  title={VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention},
  author={Deng, Shengheng and Liang, Zhihao and Sun, Lin and Jia, Kui},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Bugs

If you find any bugs in this repo, please let me know!

More Repositories

1

Fantasia3D

(ICCV2023) official repository for "Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation"
Python
721
star
2

frustum-convnet

The PyTorch Implementation of F-ConvNet for 3D Object Detection
Python
239
star
3

tango

[NeurIPS 2022] Official code repository for "TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition"
Python
141
star
4

DADA-AAAI2020

Code release for Discriminative Adversarial Domain Adaptation (AAAI2020).
Python
117
star
5

SSTNet

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks
Python
97
star
6

HelixSurf

official implementation of "HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes with Iterative Intertwined Regularization"
Python
92
star
7

SymNets

The official project for CVPR19 paper: Domain-Symmetric Networks for Adversarial Domain Adaptation
Python
85
star
8

SRDC-CVPR2020

Code release for Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering (CVPR2020-Oral).
Python
80
star
9

SkeletonBridgeRecon

The code for CVPR2019 Oral paper "A Skeleton-bridged Deep Learning Approach for Generating Meshes of Complex Topologies from Single RGB Images"
Python
78
star
10

AffordanceNet

Python
71
star
11

MultiClassDA

TPAMI2020 "Unsupervised Multi-Class Domain Adaptation: Theory, Algorithms, and Practice"
Python
71
star
12

AnalyticMesh

An Efficient Implementation of Analytic Mesh Algorithm for 3D Iso-surface Extraction from Neural Networks
C++
71
star
13

Visual-Auditory-Fusion-Perception

ๅนฟไธœ็œโ€œ็ ๆฑŸไบบๆ‰่ฎกๅˆ’โ€โ€”โ€”ๆœๅŠกๆœบๅ™จไบบๆ™บ่ƒฝๅผ•ๆ“Žๅนณๅฐ
Python
56
star
14

SkeletonNet

Code and datasets for TPAMI 2021 "SkeletonNet: A Topology-Preserving Solution for Learning Mesh Reconstruction of Object Surfaces from RGB Images "
C++
46
star
15

TTAC

[NeurIPS 2022] Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering
Python
42
star
16

DualPoseNet

Code for "DualPoseNet: Category-level 6D Object Pose and Size EstimationUsing Dual Pose Network with Refined Learning of Pose Consistency"
Python
40
star
17

SCUTSurface-code

Python
37
star
18

SS-Conv

Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"
Python
32
star
19

LPDC-Net

CVPR2021 paper "Learning Parallel Dense Correspondence from Spatio-Temporal Descriptorsfor Efficient and Robust 4D Reconstruction"
Python
29
star
20

GPNet

Python
28
star
21

MetaFGNet

The source code of the ECCV 2018 paper: Fine-Grained Visual Categorization using Meta-Learning Optimization with Sample Selection of Auxiliary Data
Python
26
star
22

GeoA3

Code for Geometry-Aware Generation of Adversarial Point Clouds
Python
26
star
23

Label-Propagation-with-Augmented-Anchors

A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems
Python
21
star
24

PartNet

The source code for the TMM paper: Part-Aware Fine-grained Object Categorization using Weakly Supervised Part Detection Network
Python
20
star
25

TRIBE

[AAAI 2024] Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization
Python
20
star
26

DCL-Net

DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation
Python
17
star
27

BiCo-Net

Code for "BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation"
Python
16
star
28

QS3

The official implementation for ECCV22 paper: Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap
Python
12
star
29

OrthDNNs

Code for OrthDNNs: Orthogonal Deep Neural Networks
Python
11
star
30

UB2DA

This repository provides code for the paper ---- On Universal Black-Box Domain Adaptation.
Python
11
star
31

gorilla-core

Python
8
star
32

raycastmesh

ray cast mesh to get normal, depth and face_ids
Cuda
7
star
33

gorilla-3d

Python
7
star
34

gmvs

This repo is a module for PatchMatch Stereo
Cuda
5
star
35

TTAC2

[TPAMI 2024] The official implementation of "Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-Training"
Python
5
star
36

MAST

[IJCAI 2023] Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose Installation
Python
4
star
37

GPNetPP

2
star