• Stars
    star
    1,453
  • Rank 32,372 (Top 0.7 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

M2Det

Codebase for AAAI2019 "M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network" [Paper link]

Author: Qijie Zhao. Date: 19/01/2019

Contents

Introduction

Motivation:

Beyond scale variation, appearance-complexity variation should be considered too for the object detection task, due to that the object instances with similar size can be quite different.

To solve this, we extend multi-scale detection fashions with a new dimension: multi-level. Deeper level learns features for objects with more appearance-complexity variation(e.g., pedestrian), while shallower level learns features for more simplistic objects(e.g., traffic light).

1, We propose Multi Level FPN:

2, Based on MLFPN, we propose a single-shot object detector: M2Det, which represents the Multi-Level Multi-Scale Detector.

Methodology:

a. Construct the base feature:

We use the output of FFMv1(Feature Fusion Module v1) to construct the base feature. The size is fixed as (c=768, w=W/8, h=H/8), in which (W,H) denote the image's input size. While the input feature of FFMv1 is a shallower feature(the size is (W/4, H/4),to keep more details) and a deeper feature(the size is (W/8,H/8),to represent more semantics). For VGG16-reduced backbone, they are conv4-3 and conv6-2(maybe the names are different). While for ResNet series, we first set the striding of Res4 from 2 to 1, then we choose the output of Res3 and Res5 as the input of FFMv1, of course we can also choose the output of Res4 and Res5.

b. The Multi-level Multi-scale feature:

Given the base feature, we start to form the M2F. For each TUM(Thinned U-shaped Module), we use a leach layer(actually, it's a 1x1 conv layer to extract a thinner feature from the base feature, it's in the FFMv2) to get the feature from base feature, and concat it with the output of the last TUM as the input of TUM. At last, aggregate the pyramidal features from all levels with similar scales.

c. Scale-wise Feature Aggregation Module:

We get the multi-level multi-scale feature, and try to re-allocate a weight for them to force the feature focusing more on the most useful channels/levels. Depending on the compress ratios, we use a SE attention module for each scale feature to learn attention along channel dimension.

Schedule

  • 13 Nov, 2018 - Release the paper
  • 1 Mar, 2019 - Release the training, evaluation, multi-scale evaluation and inference demo code + provide a pretrained model
  • 1 Apr, 2019 - Release most of the pretrained models

Preparation

the supported version is pytorch-0.4.1

  • Prepare python environment using Anaconda3.
  • Install deeplearning framework, i.e., pytorch, torchvision and other libs.
conda install pytorch==0.4.1 torchvision -c pytorch
pip install opencv-python tqdm
  • Clone this repository.
git clone https://github.com/qijiezhao/M2Det.git
  • Compile the nms and coco tools:
sh make.sh
  • Prepare dataset (e.g., VOC, COCO), refer to ssd.pytorch for detailed instructions.

Demo

We provide a M2Det512_vgg pretrained model for demonstration(visualization):

First, download the pretrained m2det512_vgg.pth(baidu cloud,google drive) file. Then, move the file to weights/.

  python demo.py -c=configs/m2det512_vgg.py -m=weights/m2det512_vgg.pth --show

You can see the image with drawed boxes as:

You can also run real-time demo using your webcam by specifying the camera's device ID with option --cam.

  python demo.py -c=configs/m2det512_vgg.py -m=weights/m2det512_vgg.pth --show --cam=0

In addition, I really suggest you to change the nms type from soft-nms to hard-nms for faster visualization. Soft-NMS is good for mAP accuracy, but it's useless for Demo/Vis.

Thanks for the volunteer demonstration of m2det: entry1, entry2, entry3.

Evaluation

1, We provide evaluation script for M2Det:

  python test.py -c=configs/m2det512_vgg.py -m=weights/m2det512_vgg.pth

Then, the evaluated result is shown as:

Even higher than our paper's original result! :)

2, You can run the test set with M2Det and submit to get a score:

  python test.py -c=configs/m2det512_vgg.py -m=weights/m2det512_vgg.pth --test

and submit the result file to CODALAB webpage.

Training

As simple as demo and evaluation, Just use the train script:

  CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py -c=configs/m2det512_vgg.py --ngpu 4 -t True

All training configs and model configs are written well in configs/*.py.

Multi-scale Evaluation

To be added.

Pre-trained Files

Now, we only provide m2det512_vgg.pth(baidu cloud,google drive) due to we have other tasks recently, we decide to release other models in the future.

Others

Citation:

Please cite the following paper if you feel M2Det useful to your research

@inproceedings{M2Det2019aaai,
  author    = {Qijie Zhao and
               Tao Sheng and
               Yongtao Wang and
               Zhi Tang and
               Ying Chen and
               Ling Cai and
               Haibing Ling},
  title     = {M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network},
  booktitle   = {The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI},
  year      = {2019},
}

Contact

For any question, please file an issue or contact

Qijie Zhao: [email protected]

More Repositories

1

CBNet_caffe

Composite Backbone Network (AAAI20)
Python
411
star
2

CBNetV2

[TIP 2022] CBNetV2: A Composite Backbone Network Architecture for Object Detection
Python
369
star
3

GALA3D

[ICML 2024] GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
HTML
245
star
4

DrivingGaussian

[CVPR 2024] DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
206
star
5

CFENet

Comprehensive Feature Enhancement Module for Single-Shot Object Detector
198
star
6

DADA

[ECCV 2020] DADA: Differentiable Automatic Data Augmentation
Python
188
star
7

DynamicDet

[CVPR 2023] DynamicDet: A Unified Dynamic Architecture for Object Detection
Python
109
star
8

T-SEA

[CVPR 2023] T-SEA: Transfer-based Self-Ensemble Attack on Object Detection
Python
88
star
9

CBNet_pytorch

CBNet implementation based on mmdetection (AAAI 2020)
Python
84
star
10

CMUA-Watermark

[AAAI 2022] CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes
Python
81
star
11

RCBEVDet

[CVPR 2024] RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
65
star
12

OPANAS

[CVPR 2021]OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
Python
44
star
13

RPAttack

(ICME2021) RPATTACK: REFINED PATCH ATTACK ON GENERAL OBJECT DETECTORS
Jupyter Notebook
42
star
14

HENet

[ECCV 2024] HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
42
star
15

BEV-MAE

[AAAI 2024] BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
Python
38
star
16

QGCN

Learning a Single Model With a Wide Range of Quality Factors for JPEG Image Artifacts Removal (TIP 2020)
Python
36
star
17

STR_TPSearch

Python
22
star
18

FlowNAS

[IJCV 2023] FlowNAS: Neural Architecture Search for Optical Flow Estimation
Python
15
star
19

IterNet

Jupyter Notebook
14
star
20

GSTO

official implementation of paper: GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling
Python
6
star
21

SAMPLING

[ICCV 2023] SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image
Python
4
star
22

ContinualContrastiveLearning

[ICME 2022] Continual Contrastive Learning for Image Classification
Python
4
star
23

MixTConv

Python
4
star
24

SReN_MM

Python
3
star
25

FORMULA

[WACV 2023] Foreground Guidance and Multi-Layer Feature Fusion for Unsupervised Object Discovery with Transformers
Python
2
star
26

A-quadrilateral-scene-text-detector

Python
2
star
27

BEVFusion

Python
1
star
28

VDIGPKU.github.io

CSS
1
star