• Stars
    star
    150
  • Rank 247,323 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch Implementation of Sparse DETR

KakaoBrain pytorch pytorch

Sparse DETR (ICLR'22)

By Byungseok Roh*, Jaewoong Shin*, Wuhyun Shin*, and Saehoon Kim at Kakao Brain. (*: Equal contribution)

Introduction

TL; DR. Sparse DETR is an efficient end-to-end object detector that sparsifies encoder tokens by using the learnable DAM(Decoder Attention Map) predictor. It achieves better performance than Deformable DETR even with only 10% encoder queries on the COCO dataset.

Abstract. DETR is the first end-to-end object detector using a transformer encoder-decoder architecture and demonstrates competitive performance but low computational efficiency on high resolution feature maps. The subsequent work, Deformable DETR, enhances the efficiency of DETR by replacing dense attention with deformable attention, which achieves 10x faster convergence and improved performance. Deformable DETR uses the multiscale feature to ameliorate performance, however, the number of encoder tokens increases by 20x compared to DETR, and the computation cost of the encoder attention remains a bottleneck. In our preliminary experiment, we observe that the detection performance hardly deteriorates even if only a part of the encoder token is updated. Inspired by this observation, we propose Sparse DETR that selectively updates only the tokens expected to be referenced by the decoder, thus help the model effectively detect objects. In addition, we show that applying an auxiliary detection loss on the selected tokens in the encoder improves the performance while minimizing computational overhead. We validate that Sparse DETR achieves better performance than Deformable DETR even with only 10% encoder tokens on the COCO dataset. Albeit only the encoder tokens are sparsified, the total computation cost decreases by 38% and the frames per second (FPS) increases by 42% compared to Deformable DETR.

Installation

Requirements

We have tested the code on the following environments:

  • Python 3.7.7 / Pytorch 1.6.0 / torchvisoin 0.7.0 / CUDA 10.1 / Ubuntu 18.04
  • Python 3.8.3 / Pytorch 1.7.1 / torchvisoin 0.8.2 / CUDA 11.1 / Ubuntu 18.04

Run the following command to install dependencies:

pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and organize them as follows:

code_root/
└── data/
    └── coco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Training

Training on a single node

For example, the command for training Sparse DETR with the keeping ratio of 10% on 8 GPUs is as follows:

$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/swint_sparse_detr_rho_0.1.sh

Training on multiple nodes

For example, the command Sparse DETR with the keeping ratio of 10% on 2 nodes of each with 8 GPUs is as follows:

On node 1:

$ MASTER_ADDR=<IP address of node 1> NODE_RANK=0 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh

On node 2:

$ MASTER_ADDR=<IP address of node 2> NODE_RANK=1 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/swint_sparse_detr_rho_0.1.sh

Direct argument control

# Deformable DETR (with bounding-box-refinement and two-stage argument, if wanted)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage
# Efficient DETR (with the class-specific head as describe in their paper)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head
# Sparse DETR (with the keeping ratio of 10% and encoder auxiliary loss)
$ GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 python main.py --with_box_refine --two_stage --eff_query_init --eff_specific_head --rho 0.1 --use_enc_aux_loss

Some tips to speed-up training

  • If your file system is slow to read images, you may consider enabling '--cache_mode' option to load the whole dataset into memory at the beginning of training.
  • You may increase the batch size to maximize the GPU utilization, according to GPU memory of yours, e.g., set '--batch_size 3' or '--batch_size 4'.

Evaluation

You can get the pre-trained model of Sparse DETR (the link is in "Main Results" session), then run the following command to evaluate it on COCO 2017 validation set:

# Note that you should run the command with the corresponding configuration.
$ ./configs/swint_sparse_detr_rho_0.1.sh --resume <path to pre-trained model> --eval

You can also run distributed evaluation by using ./tools/run_dist_launch.sh.

Main Results

The tables below demonstrate the detection performance of Sparse DETR on the COCO 2017 validation set when using different backbones.

  • Top-k : sampling the top-k object queries instead of using the learned object queries(as in Efficient DETR).
  • BBR : performing bounding box refinement in the decoder block(as in Deformable DETR).
  • The encoder auxiliary loss proposed in our paper is only applied to Sparse DETR.
  • FLOPs and FPS are measured in the same way as used in Deformable DETR.
  • Refer to Table 1 in the paper for more details.

ResNet-50 backbone

Method Epochs ρ Top-k & BBR AP #Params(M) GFLOPs B4FPS Download
Faster R-CNN + FPN 109 N/A 42.0 42M 180G 26
DETR 50 N/A 35.0 41M 86G 28
DETR 500 N/A 42.0 41M 86G 28
DETR-DC5 500 N/A 43.3 41M 187G 12
PnP-DETR 500 33% 41.1
500 50% 41.8
PnP-DETR-DC5 500 33% 42.7
500 50% 43.1
Deformable-DETR 50 N/A 43.9 39.8M 172.9G 19.1
50 N/A o 46.0 40.8M 177.3G 18.2
Sparse-DETR 50 10% o 45.3 40.9M 105.4G 26.5 link
50 20% o 45.6 40.9M 112.9G 24.8 link
50 30% o 46.0 40.9M 120.5G 23.2 link
50 40% o 46.2 40.9M 128.0G 21.8 link
50 50% o 46.3 40.9M 135.6G 20.5 link

Swin-T backbone

Method Epochs ρ Top-k & BBR AP #Params(M) GFLOPs B4FPS Download
DETR 50 N/A 35.9 45.0M 91.6G 26.8
DETR 500 N/A 45.4 45.0M 91.6G 26.8
Deformable-DETR 50 N/A 45.7 40.3M 180.4G 15.9
50 N/A o 48.0 41.3M 184.8G 15.4
Sparse-DETR 50 10% o 48.2 41.4M 113.4G 21.2 link
50 20% o 48.8 41.4M 121.0G 20 link
50 30% o 49.1 41.4M 128.5G 18.9 link
50 40% o 49.2 41.4M 136.1G 18 link
50 50% o 49.3 41.4M 143.7G 17.2 link

Initializing ResNet-50 backbone with SCRL

The performance of Sparse DETR can be further improved when the backbone network is initialized with the SCRL(Spatially Consistent Representation Learning) that aims to learn dense representations in a self-supervised way, compared to the default initialization with the ImageNet pre-trained one, denoted as IN-sup in the table below.

  • We obtained pre-trained weights from Torchvision for IN-sup, and the SCRL GitHub repository for SCRL.
  • To reproduce the SCRL results, add --scrl_pretrained_path <downloaded_filepath> to the training command.
Method ρ AP(IN-sup) AP(SCRL) AP(gain) Download
Sparse DETR 10% 45.3 46.9 +1.6 link
20% 45.6 47.2 +1.7 link
30% 46.0 47.4 +1.4 link
40% 46.2 47.7 +1.5 link
50% 46.3 47.9 +1.6 link

Citation

If you find Sparse DETR useful in your research, please consider citing:

@inproceedings{roh2022sparse,
  title={Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity},
  author={Roh, Byungseok and Shin, JaeWoong and Shin, Wuhyun and Kim, Saehoon},
  booktitle={ICLR},
  year={2022}
}

License

This project is released under the Apache 2.0 license. Copyright 2021 Kakao Brain Corp. All Rights Reserved.

More Repositories

1

fast-autoaugment

Official Implementation of 'Fast AutoAugment' in PyTorch.
Python
1,587
star
2

nerf-factory

An awesome PyTorch NeRF library
Python
1,265
star
3

pororo

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Python
1,252
star
4

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset
Python
1,062
star
5

kogpt

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)
Python
1,000
star
6

torchgpipe

A GPipe implementation in PyTorch
Python
776
star
7

karlo

Python
679
star
8

rq-vae-transformer

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Jupyter Notebook
669
star
9

mindall-e

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs
Python
630
star
10

word2word

Easy-to-use word-to-word translations for 3,564 language pairs.
Python
350
star
11

torchlars

A LARS implementation in PyTorch
Python
326
star
12

g2pm

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Python
326
star
13

kor-nlu-datasets

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
283
star
14

trident

A performance library for machine learning applications.
Python
176
star
15

autoclint

A specially designed light version of Fast AutoAugment
Python
170
star
16

hotr

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)
Python
132
star
17

kortok

The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Python
114
star
18

bassl

Python
113
star
19

scrl

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)
Python
108
star
20

flame

Official implementation of the paper "FLAME: Free-form Language-based Motion Synthesis & Editing"
Python
103
star
21

brain-agent

Brain Agent for Large-Scale and Multi-Task Agent Learning
Python
92
star
22

helo-word

Team Kakao&Brain's Grammatical Error Correction System for the ACL 2019 BEA Shared Task
Python
88
star
23

jejueo

Jejueo Datasets for Machine Translation and Speech Synthesis
Python
74
star
24

solvent

Python
66
star
25

noc

Jupyter Notebook
44
star
26

cxr-clip

Python
43
star
27

expgan

Python
41
star
28

autowu

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)
Python
39
star
29

nvs-adapter

Python
33
star
30

ginr-ipc

The official implementation of Generalizable Implicit Neural Representations with Instance Pattern Composers(CVPR’23 highlight).
Python
30
star
31

coyo-vit

ViT trained on COYO-Labeled-300M dataset
Python
28
star
32

irm-empirical-study

An Empirical Study of Invariant Risk Minimization
Python
28
star
33

coyo-align

ALIGN trained on COYO-dataset
Python
25
star
34

magvlt

The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
Python
23
star
35

hqtransformer

Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)
Jupyter Notebook
21
star
36

CheXGPT

Python
18
star
37

learning-loss-for-tta

"Learning Loss for Test-Time Augmentation (NeurIPS 2020)"
Python
9
star
38

stg

Official implementation of Selective Token Generation (COLING'22)
Jupyter Notebook
8
star
39

leco

Official implementation of LECO (NeurIPS'22)
Python
6
star
40

bc-hyperopt-example

brain cloud hyperopt example (mnist)
Python
3
star