• Stars
    star
    787
  • Rank 56,093 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created about 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

Position Embedding Transformation for Multi-View 3D Object Detection

arXiv visitors PWC

This repository is an official implementation of PETR and PETRv2. The flash attention version can be find from the "flash" branch.


PETR develops position embedding transformation (PETR) for multi-view 3D object detection. PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. Object query can perceive the 3D position-aware features and perform end-to-end object detection. It can serve as a simple yet strong baseline for future research.


PETRv2 is a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. The 3D PE achieves the temporal alignment on object position of different frames. A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set of segmentation queries. Each segmentation query is responsible for segmenting one specific patch of BEV map. PETRv2 achieves state-of-the-art performance on 3D object detection and BEV segmentation.

News

2023.01.25 Our multi-view 3D detection framework StreamPETR (63.6% NDS and 55.0% mAP)** without TTA and future frames.
2023.01.04 Our multi-modal detection framework CMT is released on arxiv.
2022.11.04 The code of multi-scale improvement in PETRv2 is released.
2022.09.21 The code of query denoise improvement in PETRv2 is released.
2022.09.04 PETRv2 with VoVNet backbone and multi-scale achieves (59.1% NDS and 50.8% mAP).
2022.08.11 PETRv2 with GLOM-like backbone and query denoise achieves (59.2% NDS and 51.2% mAP) without extra data.
2022.07.04 PETR has been accepted by ECCV 2022.
2022.06.28 The code of BEV Segmentation in PETRv2 is released.
2022.06.16 The code of 3D object detection in PETRv2 is released.
2022.06.10 The code of PETR is released.
2022.06.06 PETRv2 is released on arxiv.
2022.06.01 PETRv2 achieves another SOTA performance on nuScenes dataset (58.2% NDS and 49.0% mAP) by the temporal modeling and supports BEV segmentation.
2022.03.10 PETR is released on arxiv.
2022.03.08 PETR achieves SOTA performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset.

Preparation

This implementation is built upon detr3d, and can be constructed as the install.md.

  • Environments
    Linux, Python==3.6.8, CUDA == 11.2, pytorch == 1.9.0, mmdet3d == 0.17.1

  • Detection Data
    Follow the mmdet3d to process the nuScenes dataset (https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md).

  • Segmentation Data
    Download Map expansion from nuScenes dataset (https://www.nuscenes.org/nuscenes#download). Extract the contents (folders basemap, expansion and prediction) to your nuScenes maps folder.
    Then build Segmentation dataset:

    cd tools
    python build-dataset.py
    

    If you want to train the segmentation task immediately, we privided the processed data ( HDmaps-final.tar ) at gdrive. The processed info files of segmentation can also be find at gdrive.

  • Pretrained weights
    To verify the performance on the val set, we provide the pretrained V2-99 weights. The V2-99 is pretrained on DDAD15M (weights) and further trained on nuScenes train set with FCOS3D. For the results on test set in the paper, we use the DD3D pretrained weights. The ImageNet pretrained weights of other backbone can be found here. Please put the pretrained weights into ./ckpts/.

  • After preparation, you will be able to see the following directory structure:

    PETR
    ├── mmdetection3d
    ├── projects
    │   ├── configs
    │   ├── mmdet3d_plugin
    ├── tools
    ├── data
    │   ├── nuscenes
    │     ├── HDmaps-nocover
    │     ├── ...
    ├── ckpts
    ├── README.md
    

Train & inference

cd PETR

You can train the model following:

tools/dist_train.sh projects/configs/petr/petr_r50dcn_gridmask_p4.py 8 --work-dir work_dirs/petr_r50dcn_gridmask_p4/

You can evaluate the model following:

tools/dist_test.sh projects/configs/petr/petr_r50dcn_gridmask_p4.py work_dirs/petr_r50dcn_gridmask_p4/latest.pth 8 --eval bbox

Visualize

You can generate the reault json following:

./tools/dist_test.sh projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py work_dirs/petr_vovnet_gridmask_p4_800x320/latest.pth 8 --out work_dirs/pp-nus/results_eval.pkl --format-only --eval-options 'jsonfile_prefix=work_dirs/pp-nus/results_eval'

You can visualize the 3D object detection following:

python3 tools/visualize.py

Main Results

PETR: We provide some results on nuScenes val set with pretrained models. These model are trained on 8x 2080ti without cbgs. Note that the models and logs are also available at Baidu Netdisk with code petr.

config mAP NDS training config download
PETR-r50-c5-1408x512 30.5% 35.0% 18hours config log / gdrive
PETR-r50-p4-1408x512 31.70% 36.7% 21hours config log / gdrive
PETR-vov-p4-800x320 37.8% 42.6% 17hours config log / gdrive
PETR-vov-p4-1600x640 40.40% 45.5% 36hours config log / gdrive

PETRv2: We provide a 3D object detection baseline and a BEV segmentation baseline with two frames. The model is trained on 8x 2080ti without cbgs. The processed info files contain 30 previous frames, whose transformation matrix is aligned with the current frame. The info files, models and logs are also available at Baidu Netdisk with code petr.

config mAP NDS training config download
PETRv2-vov-p4-800x320 41.0% 50.3% 30hours config log / gdrive
config Drive Lane Vehicle backbone config download
PETRv2_BEVseg 85.6% 49.0% 46.3% V2-99 config log / gdrive
config F-score X-near X-far Z-near Z-far backbone config download
PETRv2_3DLane 61.2% 0.400 0.573 0.265 0.413 V2-99

StreamPETR: Stream-PETR achieves significant performance improvements without introducing extra computation cost, compared to the single-frame baseline.

config mAP NDS FPS-Pytorch config download
StreamPETR-r50-704x256 45.0% 55.0% 31.7/s

Acknowledgement

Many thanks to the authors of mmdetection3d and detr3d .

Citation

If you find this project useful for your research, please consider citing:

@article{liu2022petr,
  title={Petr: Position embedding transformation for multi-view 3d object detection},
  author={Liu, Yingfei and Wang, Tiancai and Zhang, Xiangyu and Sun, Jian},
  journal={arXiv preprint arXiv:2203.05625},
  year={2022}
}
@article{liu2022petrv2,
  title={PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images},
  author={Liu, Yingfei and Yan, Junjie and Jia, Fan and Li, Shuailin and Gao, Qi and Wang, Tiancai and Zhang, Xiangyu and Sun, Jian},
  journal={arXiv preprint arXiv:2206.01256},
  year={2022}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected], [email protected] or [email protected].

More Repositories

1

NAFNet

The state-of-the-art image restoration model without nonlinear activation functions.
Python
1,998
star
2

ML-GCN

PyTorch implementation of Multi-Label Image Recognition with Graph Convolutional Networks, CVPR 2019.
Python
1,384
star
3

video_analyst

A series of basic algorithms that are useful for video understanding, including Single Object Tracking (SOT), Video Object Segmentation (VOS) and so on.
Python
811
star
4

IJCAI2023-CoNR

IJCAI2023 - Collaborative Neural Rendering using Anime Character Sheets
Jupyter Notebook
782
star
5

mdistiller

The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf
Python
739
star
6

BBN

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition
Python
656
star
7

MOTR

[ECCV2022] MOTR: End-to-End Multiple-Object Tracking with TRansformer
Python
557
star
8

neural-painter

Paint artistic patterns using random neural network.
Python
531
star
9

CREStereo

Official MegEngine implementation of CREStereo(CVPR 2022 Oral).
Python
443
star
10

megvii-pku-dl-course

Homepage for the joint course of Megvii Inc. and Peking University on Deep Learning.
Python
439
star
11

HiDiffusion

Jupyter Notebook
384
star
12

MSPN

Multi-Stage Pose Network
Python
333
star
13

MOTRv2

[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Python
332
star
14

AnchorDETR

An official implementation of the Anchor DETR.
Python
321
star
15

Sparsebit

A model compression and acceleration toolbox based on pytorch.
Python
319
star
16

FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Python
273
star
17

FSCE

Python
272
star
18

TransMVSNet

(CVPR 2022) TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.
Python
261
star
19

OccDepth

Maybe the first academic open work on stereo 3D SSC method with vision-only input.
Python
258
star
20

RevCol

Official Code of Paper "Reversible Column Networks" "RevColv2"
Python
243
star
21

DCLS-SR

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.
Python
215
star
22

TLC

Test-time Local Converter
Python
214
star
23

SOLQ

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.
Python
195
star
24

introduction-neural-3d-reconstruction

Course materials for Introduction to Neural 3D Reconstruction
182
star
25

AAAI2023-PVD

Official Implementation of PVD and PVDAL: http://sk-fun.fun/PVD-AL/
Python
180
star
26

tf-tutorials

Tutorials for deep learning course here:
Jupyter Notebook
180
star
27

DPGN

[CVPR 2020] DPGN: Distribution Propagation Graph Network for Few-shot Learning.
Python
175
star
28

CADDM

Official implementation of ID-unaware Deepfake Detection Model
C++
124
star
29

PMN

[TPAMI 2023 / ACMMM 2022 Best Paper Runner-Up Award] Learnability Enhancement for Low-light Raw Denoising: Where Paired Real Data Meets Noise Modeling (a Data Perspective)
Python
121
star
30

CR-DA-DET

The official PyTorch implementation of paper Exploring Categorical Regularization for Domain Adaptive Object Detection (CR-DA-DET)
Python
115
star
31

megfile

Megvii FILE Library - Working with Files in Python same as the standard library
Python
104
star
32

TreeEnergyLoss

[CVPR2022] Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
Python
101
star
33

CVPR2023-UniDistill

CVPR2023 (highlight) - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
Python
99
star
34

Far3D

[AAAI2024] Far3D: Expanding the Horizon for Surround-view 3D Object Detection
Jupyter Notebook
98
star
35

hpman

A hyperparameter manager for deep learning experiments.
Python
94
star
36

RealFlow

The official implementation of the ECCV 2022 Oral paper: RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos
Python
85
star
37

Iter-E2EDET

Official implementation of the paper "Progressive End-to-End Object Detection in Crowded Scenes"
Python
84
star
38

HDR-Transformer

The official MegEngine implementation of the ECCV 2022 paper: Ghost-free High Dynamic Range Imaging with Context-aware Transformer
Python
82
star
39

FSSD_OoD_Detection

Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
Python
80
star
40

cv-master-ex

torch version of instant-ngp, image rendering
C++
78
star
41

SSQL-ECCV2022

PyTorch implementation of SSQL (Accepted to ECCV2022 oral presentation)
Python
75
star
42

MegFaceAnimate

Python
73
star
43

expman

Shell
62
star
44

megvii-tsinghua-dl-course

Slides with modifications for a course at Tsinghua University.
56
star
45

BasesHomo

The official PyTorch implementation of the paper "Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection".
Python
54
star
46

LGD

Official Implementation of the detection self-distillation framework LGD.
Python
52
star
47

D2C-SR

Official MegEngine implementation of ECCV2022 "D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution".
Python
43
star
48

KD-MVS

Code for ECCV2022 paper 'KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo'
Python
42
star
49

protoclip

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Python
41
star
50

pytorch-gym

Implementation of the Deep Deterministic Policy Gradient(DDPG) in bullet Gym using pytorch
Python
40
star
51

AGFlow

Learning Optical Flow with Adaptive Graph Reasoning (AGFlow, AAAI-2022)
Python
39
star
52

HomoGAN

This is the official implementation of HomoGAN, CVPR2022
Python
38
star
53

KPAFlow

PyTorch implementation of KPA-Flow. Learning Optical Flow with Kernel Patch Attention (CVPR-2022)
Python
37
star
54

FullMatch

Official implementation of FullMatch (CVPR2023)
Python
37
star
55

TPS-CVPR2023

Python
37
star
56

FST-Matching

Official implementation of the FST-Matching Model.
Python
34
star
57

PCB

Official code for CVPR 2022 paper "Relieving Long-tailed Instance Segmentation via Pairwise Class Balance".
Python
34
star
58

US3L-CVPR2023

PyTorch implementation of US3L (Accepted to CVPR2023)
Python
32
star
59

basecls

A codebase & model zoo for pretrained backbone based on MegEngine.
Python
32
star
60

Sobolev_INRs

[ECCV 2022] The official experimental code of "Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives"
Python
29
star
61

Portraits_Correction

Python
29
star
62

basedet

An object detection codebase based on MegEngine.
Python
27
star
63

Co-mining

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.
Python
26
star
64

zipfls

This repo is the official megengine implementation of the ECCV2022 paper: Efficient One Pass Self-distillation with Zipf's Label Smoothing.
Python
24
star
65

tf-cpn

Cascade Pyramid Netwrok
Python
24
star
66

Arch-Net

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Python
22
star
67

IntLLaMA

IntLLaMA: A fast and light quantization solution for LLaMA
Python
21
star
68

ED-Net

PyTorch implementation of A Lightweight Encoder-Decoder Path for Deep Residual Networks.
Python
18
star
69

juicefs-python

JuiceFS Python SDK
Python
18
star
70

MSCL

[ECCV2022] Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Python
16
star
71

hpargparse

argparse extension for hpman
Python
16
star
72

RG-SENet_SP-SENet

PyTorch implementation of Delving Deep into Spatial Pooling for Squeeze-and-Excitation Networks.
Python
16
star
73

LBHomo

This is the official PyTorch implementation of Semi-supervised Deep Large-baseline Homography Estimation with Progressive Equivalence Constraint, AAAI 2023
Python
14
star
74

MEMD

Megvii Electric Moped Detector (ONNX based inference)
Python
13
star
75

DVN

Python
12
star
76

Occ2net

Jupyter Notebook
12
star
77

revisitAIRL

[ECCV2022] Revisiting the Critical Factors of Augmentation-Invariant Representation Learning
Python
11
star
78

megengine-face-recognition

Python
9
star
79

SimpleDG

This is the training and test code for ECCV2022 workshop NICO challenge
Python
7
star
80

GeneGAN

Pytorch version of GeneGAN
Python
7
star
81

basecore

basecore is a simple repo that provides deep learning frame for MegEngine.
Python
7
star
82

hpnevergrad

A nevergrad extension for hpman
Python
4
star
83

.github

2
star
84

DRConv

Python
1
star