• Stars
    star
    167
  • Rank 226,635 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN (ECCV 2018)

Decoupled Classification Refinement for Object Detection

The is an official implementation of our ECCV2018 paper "Revisiting RCNN: On Awakening the Classification Power of Faster RCNN (https://arxiv.org/abs/1803.06799)" and its extension "Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection (https://arxiv.org/abs/1810.04002)".

Introduction

Decoupled Classification Refinement is initially described in an ECCV 2018 paper (we call it DCR V1). It is further extended (we call it DCR V2) in a recent tech report. In this extension, we speed the original DCR V1 up by 3x with same accuracy. Unlike DCR V1 which requires a complicated two-stage training, DCR V2 is simpler and can be trained end-to-end.

High level structure of DCR modules.

Detailed DCR V2 module.

News

Disclaimer

This is an official implementation for Decoupled Classification Refinement based on MXNet. It is worth noticing that:

  • The code is tested on official MXNet version 1.1.0 installed using pip.
  • We trained our model based on the ImageNet pre-trained ResNet-v1-101 using a model converter. The converted model produces slightly lower accuracy (Top-1 Error on ImageNet val: 24.0% v.s. 23.6%).
  • This repository is based on Deformable ConvNets.

License

ยฉ University of Illinois at Urbnana-Champaign, 2018. Licensed under an MIT license.

Citing DCR

If you find Decoupled Classification Refinement module useful in your research, please consider citing:

@article{cheng18decoupled,
author = {Cheng, Bowen and Wei, Yunchao and Shi, Honghui and Feris, Rogerio and Xiong, Jinjun and Huang, Thomas},
title = {Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection},
journal = {arXiv preprint arXiv:1810.04002},
year = {2018}
}

@inproceedings{cheng18revisiting,
author = {Cheng, Bowen and Wei, Yunchao and Shi, Honghui and Feris, Rogerio and Xiong, Jinjun and Huang, Thomas},
title = {Revisiting RCNN: On Awakening the Classification Power of Faster RCNN},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

Main Results

For simplicity, all train/val/test-dev refer to COCO2017 train/val and COCO test-dev.
Notes:

  • all FPN models are trained with OHEM following Deformable ConvNets.
  • Prefix D- means adding Deformable Convolutions and replacing ROIPooling with Deformable ROIPooling.
  • NO multi-scale train/test, NO soft-NMS, NO ensemble! These are purely single model results without any test-time tricks!

COCO test-dev

training data testing data AP [email protected] [email protected] AP@S AP@M AP@L
Faster R-CNN (2fc), ResNet-v1-101 trainval test-dev 30.5 52.2 31.8 9.7 32.3 48.3
+ DCR V1, ResNet-v1-101/152 trainval test-dev 33.9 57.9 35.3 14.0 36.1 50.8
+ DCR V2, ResNet-v1-101 trainval test-dev 34.3 57.7 35.8 13.8 36.7 51.1
D-Faster R-CNN (2fc), ResNet-v1-101 trainval test-dev 35.2 55.1 38.2 14.6 37.4 52.6
+ DCR V1, ResNet-v1-101/152 trainval test-dev 38.1 59.7 41.1 17.9 41.2 54.7
+ DCR V2, ResNet-v1-101 trainval test-dev 38.2 59.7 41.2 17.3 41.7 54.6
FPN, ResNet-v1-101 trainval test-dev 38.8 61.7 42.6 21.9 42.1 49.7
+ DCR V1, ResNet-v1-101/152 trainval test-dev 40.7 64.4 44.6 24.3 43.7 51.9
+ DCR V2, ResNet-v1-101 trainval test-dev 40.8 63.6 44.5 24.3 44.3 52.0
D-FPN, ResNet-v1-101 trainval test-dev 41.7 64.0 45.9 23.7 44.7 53.4
+ DCR V1, ResNet-v1-101/152 trainval test-dev 43.1 66.1 47.3 25.8 45.9 55.3
+ DCR V2, ResNet-v1-101 trainval test-dev 43.5 65.9 47.6 25.8 46.6 55.9

COCO validation

training data testing data AP [email protected] [email protected] AP@S AP@M AP@L
Faster R-CNN (2fc), ResNet-v1-101 train val 30.0 50.9 30.9 9.9 33.0 49.1
+ DCR V1, ResNet-v1-101/152 train val 33.1 56.3 34.2 13.8 36.2 51.5
+ DCR V2, ResNet-v1-101 train val 33.6 56.7 34.7 13.5 37.1 52.2
D-Faster R-CNN (2fc), ResNet-v1-101 train val 34.4 53.8 37.2 14.4 37.7 53.1
+ DCR V1, ResNet-v1-101/152 train val 37.2 58.6 39.9 17.3 41.2 55.5
+ DCR V2, ResNet-v1-101 train val 37.5 58.6 40.1 17.2 42.0 55.5
FPN, ResNet-v1-101 train val 38.2 61.1 41.9 21.8 42.3 50.3
+ DCR V1, ResNet-v1-101/152 train val 40.2 63.8 44.0 24.3 43.9 52.6
+ DCR V2, ResNet-v1-101 train val 40.3 62.9 43.7 24.3 44.6 52.7
D-FPN + OHEM, ResNet-v1-101 train val 41.4 63.5 45.3 24.4 45.0 55.1
+ DCR V1, ResNet-v1-101/152 train val 42.6 65.3 46.5 26.4 46.1 56.4
+ DCR V2, ResNet-v1-101 train val 42.8 65.1 46.8 27.1 46.6 56.1

Requirements: Software

  1. MXNet from the offical repository. We tested our code on MXNet version 1.1.0. Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release.

  2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.

  3. Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running

    pip install -r requirements.txt
    
  4. For Windows users, Visual Studio 2015 is needed to compile cython module.

Requirements: Hardware

For experiments without FPN, our models are trained with NVIDIA GTX 1080TI (Required GPU Memory > 10G)
For experiments with FPN, our models are trained with NVIDIA Tesla V100 (Required GPU Memory > 15G)

Installation

  1. Clone the Decoupled Classification Refinement repository, and we'll call the directory that you cloned as ${DCR_ROOT}.
git clone https://github.com/bowenc0221/Decoupled-Classification-Refinement.git
  1. For Windows users, run cmd .\init.bat. For Linux user, run sh ./init.sh. The scripts will build cython module automatically and create some folders.

  2. Install MXNet following this link

Preparation for Training & Testing

  1. Please download COCO2017 trainval datasets (Note: although COCO2014 and COCO2017 has exactly same images, their naming for images are different), and make sure it looks like this:

    ./data/coco/
    
  2. Please download ImageNet-pretrained ResNet-v1-101 model manually from OneDrive, and put it under folder ./model. Make sure it looks like this:

    ./model/pretrained_model/resnet_v1_101-0000.params
    

COCO Models

You can download COCO models via [Google Drive]
To test a model, please follow these steps (take resnet_v1_101_coco_train2017_dcr_end2end.params for example):

  1. move resnet_v1_101_coco_train2017_dcr_end2end.params to ./output/dcr/coco/resnet_v1_101_coco_train2017_dcr_end2end/train2017/rcnn_coco-0008.params
  2. use command python experiments/faster_rcnn_dcr/rcnn_test.py --cfg experiments/faster_rcnn_dcr/cfgs/resnet_v1_101_coco_train2017_dcr_end2end.yaml to evaluate

Usage

  1. All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/faster_rcnn_dcr/cfgs, ./experiments/fpn_dcr/cfgs.

  2. Eight config files have been provided so far, namely, Faster R-CNN(2fc) for COCO, Deformable Faster R-CNN(2fc) for COCO, FPN for COCO, Deformable FPN for COCO, respectively and their DCR versions. We use 4 GPUs to train all models on COCO.

  3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets + DCR on COCO with ResNet-v1-101, use the following command

    python experiments/faster_rcnn_dcr/rcnn_end2end_train_test.py --cfg experiments/faster_rcnn_dcr/cfgs/resnet_v1_101_coco_trainval_dcn_dcr_end2end.yaml
    

    A cache folder would be created automatically to save the model and the log under output/dcn_dcr/coco/. (Note: the command above automatically run test after training)
    To only test the model, use command

    python experiments/faster_rcnn_dcr/rcnn_test.py --cfg experiments/faster_rcnn_dcr/cfgs/resnet_v1_101_coco_trainval_dcn_dcr_end2end.yaml
    
  4. Please find more details in config files and in our code.

Note

Code for DCR V1 is under dcr_v1 branch.

Contact

Bowen Cheng (bcheng9 AT illinois DOT edu)
Homepage: https://bowenc0221.github.io/

More Repositories

1

OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
Jupyter Notebook
1,461
star
2

Versatile-Diffusion

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
Python
1,300
star
3

Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
Python
1,037
star
4

Prompt-Free-Diffusion

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024
Python
727
star
5

Matting-Anything

Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.
Python
607
star
6

Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)
Python
492
star
7

Cross-Scale-Non-Local-Attention

PyTorch code for our paper "Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining" (CVPR2020).
Python
401
star
8

Pyramid-Attention-Networks

[IJCV] Pyramid Attention Networks for Image Restoration: new SOTA results on multiple image restoration tasks: denoising, demosaicing, compression artifact reduction, super-resolution
Python
382
star
9

NATTEN

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Cuda
333
star
10

Smooth-Diffusion

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024
Python
305
star
11

VCoder

VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024
Python
259
star
12

Rethinking-Text-Segmentation

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
Python
241
star
13

Agriculture-Vision

[CVPR 2020 & 2021 & 2022 & 2023] Agriculture-Vision Dataset, Prize Challenge and Workshop: A joint effort with many great collaborators to bring Agriculture and Computer Vision/AI communities together to benefit humanity!
199
star
14

Self-Similarity-Grouping

Self-similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-identification (ICCV 2019, Oral)
Python
188
star
15

FcF-Inpainting

[WACV 2023] Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand
Jupyter Notebook
174
star
16

Convolutional-MLPs

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021
Python
163
star
17

3D-Point-Cloud-Learning

131
star
18

CuMo

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Python
130
star
19

Forget-Me-Not

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models, 2023
Python
107
star
20

VMFormer

[Preprint] VMFormer: End-to-End Video Matting with Transformer
Python
106
star
21

Semi-Supervised-Transfer-Learning

[CVPR 2021] Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
Jupyter Notebook
101
star
22

SGL-Retinal-Vessel-Segmentation

[MICCAI 2021] Study Group Learning: Improving Retinal Vessel Segmentation Trained with Noisy Labels: New SOTA on both DRIVE and CHASE_DB1.
Jupyter Notebook
101
star
23

StyleNAT

New flexible and efficient image generation framework that sets new SOTA on FFHQ-256 with FID 2.05, 2022
Python
97
star
24

Unsupervised-Domain-Adaptation-with-Differential-Treatment

[CVPR 2020] Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation
Python
88
star
25

Text2Video-Zero-sd-webui

Python
79
star
26

GFR-DSOD

Improving Object Detection from Scratch via Gated Feature Reuse (BMVC 2019)
Python
65
star
27

SH-GAN

[WACV 2023] Image Completion with Heterogeneously Filtered Spectral Hints
Python
62
star
28

VIM

Python
54
star
29

UltraSR-Arbitrary-Scale-Super-Resolution

[Preprint] UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution, 2021
53
star
30

Any-Precision-DNNs

Any-Precision Deep Neural Networks (AAAI 2021)
Python
44
star
31

Horizontal-Pyramid-Matching

Horizontal Pyramid Matching for Person Re-identification (AAAI 2019)
Python
39
star
32

Pseudo-IoU-for-Anchor-Free-Object-Detection

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection
Python
30
star
33

Human-Object-Interaction-Detection

25
star
34

CompFeat-for-Video-Instance-Segmentation

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation (AAAI 2021)
19
star
35

Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Python
17
star
36

OneFormer-Colab

[Colab Demo Code] OneFormer: One Transformer to Rule Universal Image Segmentation.
Python
13
star
37

DiSparse-Multitask-Model-Compression

[CVPR 2022] DiSparse: Disentangled Sparsification for Multitask Model Compression
Jupyter Notebook
13
star
38

Interpretable-Visual-Reasoning

[ICCV 2021] Interpretable Visual Reasoning via Induced Symbolic Space
9
star
39

Mask-Selection-Networks

[CVPR 2021] Youtube-VIS 2021 3rd place, [CVPR 2020] winner DAVIS 2020. Code for mask selection based methods.
6
star
40

Activity-Recognition

5
star
41

Boosted-Dynamic-Networks

Boosted Dynamic Neural Networks, AAAI 2023
Python
4
star
42

Aneurysm-Segmentation-with-Multi-Teacher-Pseudo-Labels

1
star