• Stars
    star
    535
  • Rank 82,940 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR 2022 Oral] Official implementation of DN-DETR

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

By Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M.Ni, and Lei Zhang.

This repository is an official implementation of the DN-DETR. Accepted to CVPR 2022 (score 112, Oral presentation). Code is avaliable now. [CVPR paper link] [extended version paper link] [中文解读]

News

[2022/12]: We release an extended version of DN-DETR on arxiv, here is the paper link! We add denoising training to CNN-based model Faster R-CNN, segmentation model Mask2Former, and other DETR-like models like Anchor DETR and DETR, to improve the performance of these models.

[2022/12]: Code for Mask DINO is available! Mask DINO further Achieves 51.7 and 59.0 box AP on COCO with a ResNet-50 and SwinL without extra detection data, outperforming DINO under the same setting!

[2022/11]: DINO implementation based on DN-DETR is released in this repo. Credits to @Vallum! This optimized version under ResNet-50 can reach 50.8 ~ 51.0 AP in 36epochs.

[2022/9]: We release a toolbox detrex that provides many state-of-the-art Transformer-based detection algorithms. It includes DN-DETR with better performance. Welcome to use it!

[2022/7] Code for DINO is available here!

[2022/6]: We release a unified detection and segmentation model Mask DINO that achieves the best results on all the three segmentation tasks (54.5 AP on COCO instance leaderboard, 59.4 PQ on COCO panoptic leaderboard, and 60.8 mIoU on ADE20K semantic leaderboard)! Code will be available here.

[2022/5]Our code is available! Better performance 49.5AP on COCO achieved with ResNet-50.

[2022/4]Code is avaliable for DAB-DETR here.

[2022/3]We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmentation. Welcome to your attention!

[2022/3]DN-DETR is selected for an Oral presentation in CVPR2022.

[2022/3]We release another work DINO:DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection that for the first time establishes a DETR-like model as a SOTA model on the leaderboard. Also based on DN. Code will be avaliable here.

Introduction

  1. We present a novel denoising training method to speedup DETR training and offer a deepened understanding of the slow convergence issue of DETR-like methods.
  2. DN is only a training method and be plugged into many DETR-like models or even traditional models to boost performance.
  3. DN-DETR achieves AP 43.4 and 48.6 with 12 and 50 epochs of training with ResNet-50 backbone. Compared with the baseline models under the same setting, DN-DETR achieves comparable performance with 50% training epochs.
  4. Our optmized models result in better performance. DN-Deformable-DETR achieves 49.5 with a ResNet-50 backbone.

Model

We build upon DAB-DETR and add a denoising part to accelerate training convergence. It only adds minimal computation and will be removed during inference time. DN-DETR We conduct extensive experiments to validate the effectiveness of our denoising training, for example, the convergnece curve comparison. You can refer to our paper for more experimental results. DN-DETR

Model Zoo

We provide our models under DAB-DETR, DAB-Deformable-DETR(deformable encoder only), and DAB-Deformable-DETR (See DAB-DETR code and paper for more details).

You can also refer to our

[model zoo in google drive]

[model zoo in η™ΎεΊ¦η½‘η›˜]οΌˆζε–η nietοΌ‰.

50 epoch setting

name backbone box AP Log/Config/Checkpoint Where in Our Paper
0 DN-DETR-R50 R50 44.41 Google DriveΒ /Β BaiDuΒ  Table 1
2 DN-DETR-R50-DC5 R50 46.3 Google DriveΒ /Β BaiDuΒ  Table 1
5 DN-DAB-Deformbale-DETR
(Deformbale Encoder Only)3
R50 48.6 Google DriveΒ /Β BaiDuΒ  Table 3
6 DN-DAB-Deformable-DETR-R50-v24 R50 49.5 (48.4 in 24 epochs) Google DriveΒ /Β BaiDuΒ  Optimized implementation with deformable attention in both encoder and decoder. See DAB-DETR for more details.

12 epoch setting

name backbone box AP Log/Config/Checkpoint Where in Our Paper
1 DN-DAB-DETR-R50-DC5(3 pat)2 R50 41.7 Google DriveΒ /Β BaiDuΒ  Table 2
4 DN-DAB-DETR-R101-DC5(3 pat)2 R101 42.8 Google DriveΒ /Β BaiDuΒ  Table 2
5 DN-DAB-Deformbale-DETR
(Deformble Encoder Only)3
R50 43.4 Google DriveΒ /Β BaiDuΒ  Table 2
5 DN-DAB-Deformbale-DETR
(Deformble Encoder Only)3
R101 44.1 Google DriveΒ /Β BaiDuΒ  Table 2

Notes:

  • 1: The result increases compared with the reported one in our paper (from 44.1to 44.4) since we optimized the code. We did not rerun other models, so you are expected to get better performance than reported ones in our paper.
  • 2: The models with marks (3 pat) are trained with multiple pattern embeds (refer to Anchor DETR or DAB-DETR for more details.).
  • 3: This model is based on DAB-Deformbale-DETR(Deformbale Encoder Only), which is a multiscale version of DAB-DETR. It requires 16 GPUs to train as it only use deformable attention in the encoder.
  • 4: This model is based on DAB-Deformbale-DETR which is an optimized implementation with deformable DETR. See DAB-DETR for more details. You are encouraged to use this deformable version as it uses deformable attention in both encoder and deocder, which is more lightweight (i.e, train with 4/8 A100 GPUs) and converges faster (i.e, achieves 48.4 in 24 epochs, comparable to the 50-epoch DAB-Deformable-DETR).

Usage

How to use denoising training in your own model

Our code largely follows DAB-DETR and adds additional components for denoising training, which are warped in a file dn_components.py. There are mainly 3 functions including prepare_for_dn, dn_post_proces (the first two are used in your detection forward function to process the dn part), and compute_dn_loss(this one is used to calculate dn loss). You can import these functions and add them to your own detection model. You may also compare DN-DETR and DAB-DETR to see how these functions are added if you would like to use it in your own detection models.

You are also encouraged to apply it to some other DETR-like models or even traditional detection models and update results in this repo.

Installation

We use the DAB-DETR project as our codebase, hence no extra dependency is needed for our DN-DETR. For the DN-Deformable-DETR, you need to compile the deformable attention operator manually.

We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

  1. Clone this repo
git clone https://github.com/IDEA-Research/DN-DETR.git
cd DN-DETR
  1. Install Pytorch and torchvision

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install other needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/dn_dab_deformable_detr/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data

Please download COCO 2017 dataset and organize them as following:

COCODIR/
  β”œβ”€β”€ train2017/
  β”œβ”€β”€ val2017/
  └── annotations/
  	β”œβ”€β”€ instances_train2017.json
  	└── instances_val2017.json

Run

We use the standard DN-DETR-R50 and DN-Deformable-DETR-R50 as examples for training and evalulation.

Eval our pretrianed models

Download our DN-DETR-R50 model checkpoint from this link and perform the command below. You can expect to get the final AP about 44.4.

For our DN-DAB-Deformable-DETR_Deformable_Encoder_Only (download here). The final AP expected is 48.6.

For our DN-DAB-Deformable-DETR (download here), the final AP expected is 49.5.

# for dn_detr: 44.1 AP; optimized result is 44.4AP
python main.py -m dn_dab_detr \
  --output_dir logs/dn_DABDETR/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --use_dn \
  --eval

# for dn_deformable_detr: 49.5 AP
python main.py -m dn_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --transformer_activation relu \
  --use_dn \
  --eval
  
# for dn_deformable_detr_deformable_encoder_only: 48.6 AP
python main.py -m dn_dab_deformable_detr_deformable_encoder_only 
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --transformer_activation relu \
  --num_patterns 3 \  # use 3 pattern embeddings
  --use_dn  \
  --eval

Training your own models

Similarly, you can also train our model on a single process:

# for dn_detr
python main.py -m dn_dab_detr \
  --output_dir logs/dn_DABDETR/R50 \
  --batch_size 1 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR  # replace the args to your COCO path
  --use_dn

Distributed Run

However, as the training is time consuming, we suggest to train the model on multi-device.

If you plan to train the models on a cluster with Slurm, here is an example command for training:

# for dn_detr: 44.4 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name DNDETR \
  --coco_path /path/to/your/COCODIR \
  -m dn_dab_detr \
  --job_dir logs/dn_DABDETR/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 \
  --use_dn

# for dn_dab_deformable_detr: 49.5 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name dn_dab_deformable_detr \
  --coco_path /path/to/your/COCODIR \
  -m dab_deformable_detr \
  --transformer_activation relu \
  --job_dir logs/dn_dab_deformable_detr/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 \
  --use_dn

# for dn_dab_deformable_detr_deformable_encoder_only: 48.6 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name dn_dab_deformable_detr_deformable_encoder_only \
  --coco_path /path/to/your/COCODIR \
  -m dn_dab_deformable_detr_deformable_encoder_only \
  --transformer_activation relu \
  --job_dir logs/dn_dab_deformable_detr/R50_%j \
  --num_patterns 3 \ 
  --batch_size 1 \
  --ngpus 8 \
  --nodes 2 \
  --epochs 50 \
  --lr_drop 40 \
  --use_dn

If you want to train our DC reversion or mulitple-patterns version, add

--dilation  # for DC version

--num_patterns 3  # for 3 patterns

However, this requires additional training resources and memory, i.e, use 16 GPUs.

The final AP should be similar or better to ours, as our optimized result is better than our reported performance in the paper( for example, we report 44.1 for DN-DETR, but our new result can achieve 44.4. Don't be surprised if you get better result! ).

Our training setting is same as DAB-DETR but add a argument --use_dn, you may also refer to DAB-DETR as well.

Notes:

  • The results are sensitive to the batch size. We use 16(2 images each GPU x 8 GPUs) by default.

Or run with multi-processes on a single node:

# for dn_dab_detr: 44.4 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dn_dab_detr \
  --output_dir logs/dn_DABDETR/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR \
  --use_dn

# for dn_deformable_detr: 49.5 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dn_dab_deformable_detr \
  --output_dir logs/dn_dab_deformable_detr/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --transformer_activation relu \
  --coco_path /path/to/your/COCODIR \
  --use_dn

Links

Our work is based on DAB-DETR. We also release another SOAT detection model DINO based on DN-DETR and DAB-DETR.

  • DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
    Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum.
    arxiv 2022.
    [paper] [code].

  • DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.
    Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang.
    International Conference on Learning Representations (ICLR) 2022.
    [Paper] [Code].

LICNESE

DN-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Copyright (c) IDEA. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Bibtex

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{li2022dn,
  title={Dn-detr: Accelerate detr training by introducing query denoising},
  author={Li, Feng and Zhang, Hao and Liu, Shilong and Guo, Jian and Ni, Lionel M and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13619--13627},
  year={2022}
}

More Repositories

1

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Jupyter Notebook
14,724
star
2

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Python
6,003
star
3

DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Python
2,160
star
4

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Python
2,147
star
5

DWPose

"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
Python
2,136
star
6

detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
Python
2,001
star
7

awesome-detection-transformer

Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
1,261
star
8

MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Python
1,149
star
9

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
Python
680
star
10

OpenSeeD

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
Python
650
star
11

Motion-X

[NeurIPS 2023] Official implementation of the paper "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset"
Python
542
star
12

DAB-DETR

[ICLR 2022] Official implementation of the paper "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR"
Jupyter Notebook
499
star
13

OSX

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
Python
291
star
14

HumanTOMATO

[ICML 2024] πŸ…HumanTOMATO: Text-aligned Whole-body Motion Generation
Python
276
star
15

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Python
226
star
16

deepdataspace

The Go-To Choice for CV Data Visualization, Annotation, and Model Analysis.
TypeScript
212
star
17

Stable-DINO

[ICCV 2023] Official implementation of the paper "Detection Transformer with Stable Matching"
Python
203
star
18

Lite-DETR

[CVPR 2023] Official implementation of the paper "Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR"
Python
182
star
19

DreamWaltz

[NeurIPS 2023] Official implementation of the paper "DreamWaltz: Make a Scene with Complex 3D Animatable Avatars".
Python
176
star
20

MP-Former

[CVPR 2023] Official implementation of the paper: MP-Former: Mask-Piloted Transformer for Image Segmentation
Python
99
star
21

HumanSD

The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Python
92
star
22

HumanArt

The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
86
star
23

ED-Pose

The official repo for [ICLR'23] "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "
Python
73
star
24

DQ-DETR

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
54
star
25

DisCo-CLIP

Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".
Python
47
star
26

LipsFormer

Python
34
star
27

DiffHOI

Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"
Python
29
star
28

hana

Implementation and checkpoints of Imagen, Google's text-to-image synthesis neural network, in Pytorch
Python
17
star
29

TOSS

[ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"
Python
15
star
30

IYFC

C++
9
star
31

TAPTR

6
star
32

detrex-storage

2
star