Object Detection
Method | VOC2007 | VOC2010 | VOC2012 | ILSVRC 2013 | MSCOCO 2015 | Speed |
---|---|---|---|---|---|---|
OverFeat | - | - | - | 24.3% | - | - |
R-CNN (AlexNet) | 58.5% | 53.7% | 53.3% | 31.4% | - | - |
R-CNN (VGG16) | 66.0% | - | - | - | - | - |
SPP_net(ZF-5) | 54.2%(1-model), 60.9%(2-model) | - | - | 31.84%(1-model), 35.11%(6-model) | - | - |
DeepID-Net | 64.1% | - | - | 50.3% | - | - |
NoC | 73.3% | - | 68.8% | - | - | - |
Fast-RCNN (VGG16) | 70.0% | 68.8% | 68.4% | - | 19.7%(@[0.5-0.95]), 35.9%(@0.5) | - |
MR-CNN | 78.2% | - | 73.9% | - | - | - |
Faster-RCNN (VGG16) | 78.8% | - | 75.9% | - | 21.9%(@[0.5-0.95]), 42.7%(@0.5) | 198ms |
Faster-RCNN (ResNet-101) | 85.6% | - | 83.8% | - | 37.4%(@[0.5-0.95]), 59.0%(@0.5) | - |
SSD300 (VGG16) | 72.1% | - | - | - | - | 58 fps |
SSD500 (VGG16) | 75.1% | - | - | - | - | 23 fps |
ION | 79.2% | - | 76.4% | - | - | - |
CRAFT | 75.7% | - | 71.3% | 48.5% | - | - |
OHEM | 78.9% | - | 76.3% | - | 25.5%(@[0.5-0.95]), 45.9%(@0.5) | - |
R-FCN (ResNet-50) | 77.4% | - | - | - | - | 0.12sec(K40), 0.09sec(TitianX) |
R-FCN (ResNet-101) | 79.5% | - | - | - | - | 0.17sec(K40), 0.12sec(TitianX) |
R-FCN (ResNet-101),multi sc train | 83.6% | - | 82.0% | - | 31.5%(@[0.5-0.95]), 53.2%(@0.5) | - |
PVANet 9.0 | 81.8% | - | 82.5% | - | - | 750ms(CPU), 46ms(TitianX) |
Leaderboard
Detection Results: VOC2012
- intro: Competition “comp4” (train on additional data)
- homepage: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
Papers
Deep Neural Networks for Object Detection
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
- arxiv: http://arxiv.org/abs/1312.6229
- github: https://github.com/sermanet/OverFeat
- code: http://cilvr.nyu.edu/doku.php?id=software:overfeat:start
R-CNN
Rich feature hierarchies for accurate object detection and semantic segmentation
- intro: R-CNN
- arxiv: http://arxiv.org/abs/1311.2524
- supp: http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf
- slides: http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf
- slides: http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf
- github: https://github.com/rbgirshick/rcnn
- notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
- caffe-pr(“Make R-CNN the Caffe detection example”): BVLC/caffe#482
MultiBox
Scalable Object Detection using Deep Neural Networks
- intro: first MultiBox. Train a CNN to predict Region of Interest.
- arxiv: http://arxiv.org/abs/1312.2249
- github: https://github.com/google/multibox
- blog: https://research.googleblog.com/2014/12/high-quality-object-detection-at-scale.html
Scalable, High-Quality Object Detection
- intro: second MultiBox
- arxiv: http://arxiv.org/abs/1412.1441
- github: https://github.com/google/multibox
SPP-Net
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
- intro: ECCV 2014 / TPAMI 2015
- arxiv: http://arxiv.org/abs/1406.4729
- github: https://github.com/ShaoqingRen/SPP_net
- notes: http://zhangliliang.com/2014/09/13/paper-note-sppnet/
DeepID-Net
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
- intro: PAMI 2016
- intro: an extension of R-CNN. box pre-training, cascade on region proposals, deformation layers and context representations
- project page: http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html
- arxiv: http://arxiv.org/abs/1412.5661
Object Detectors Emerge in Deep Scene CNNs
- arxiv: http://arxiv.org/abs/1412.6856
- paper: https://www.robots.ox.ac.uk/~vgg/rg/papers/zhou_iclr15.pdf
- paper: https://people.csail.mit.edu/khosla/papers/iclr2015_zhou.pdf
- slides: http://places.csail.mit.edu/slide_iclr2015.pdf
segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
- intro: CVPR 2015
- project(code+data): https://www.cs.toronto.edu/~yukun/segdeepm.html
- arxiv: https://arxiv.org/abs/1502.04275
- github: https://github.com/YknZhu/segDeepM
NoC
Object Detection Networks on Convolutional Feature Maps
- intro: TPAMI 2015
- arxiv: http://arxiv.org/abs/1504.06066
Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
- arxiv: http://arxiv.org/abs/1504.03293
- slides: http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf
- github: https://github.com/YutingZhang/fgs-obj
Fast R-CNN
Fast R-CNN
- arxiv: http://arxiv.org/abs/1504.08083
- slides: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
- github: https://github.com/rbgirshick/fast-rcnn
- webcam demo: rbgirshick/fast-rcnn#29
- notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
- notes: http://blog.csdn.net/linj_m/article/details/48930179
- github(“Fast R-CNN in MXNet”): https://github.com/precedenceguo/mx-rcnn
- github: https://github.com/mahyarnajibi/fast-rcnn-torch
- github: https://github.com/apple2373/chainer-simple-fast-rnn
- github(Tensorflow): https://github.com/zplizzi/tensorflow-fast-rcnn
DeepBox
DeepBox: Learning Objectness with Convolutional Networks
MR-CNN
Object detection via a multi-region & semantic segmentation-aware CNN model
- intro: ICCV 2015. MR-CNN
- arxiv: http://arxiv.org/abs/1505.01749
- github: https://github.com/gidariss/mrcnn-object-detection
- notes: http://zhangliliang.com/2015/05/17/paper-note-ms-cnn/
- notes: http://blog.cvmarcher.com/posts/2015/05/17/multi-region-semantic-segmentation-aware-cnn/
- my notes: Who can tell me why there are a bunch of duplicated sentences in section 7.2 “Detection error analysis”? :-D
Faster R-CNN
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- intro: NIPS 2015
- arxiv: http://arxiv.org/abs/1506.01497
- gitxiv: http://www.gitxiv.com/posts/8pfpcvefDYn2gSgXk/faster-r-cnn-towards-real-time-object-detection-with-region
- slides: http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/slides/w05-FasterR-CNN.pdf
- github: https://github.com/ShaoqingRen/faster_rcnn
- github: https://github.com/rbgirshick/py-faster-rcnn
- github: https://github.com/mitmul/chainer-faster-rcnn
- github(Torch): https://github.com/andreaskoepf/faster-rcnn.torch
- github(Torch): https://github.com/ruotianluo/Faster-RCNN-Densecap-torch
- github(Tensorflow): https://github.com/smallcorgi/Faster-RCNN_TF
- github(Tensorflow): https://github.com/CharlesShang/TFFRCNN
Faster R-CNN in MXNet with distributed implementation and data parallelization
Contextual Priming and Feedback for Faster R-CNN
- intro: ECCV 2016. Carnegie Mellon University
- paper: http://abhinavsh.info/context_priming_feedback.pdf
- poster: http://www.eccv2016.org/files/posters/P-1A-20.pdf
An Implementation of Faster RCNN with Study for Region Sampling
- intro: Technical Report, 3 pages. CMU
- arxiv: https://arxiv.org/abs/1702.02138
- github: https://github.com/endernewton/tf-faster-rcnn
YOLO
You Only Look Once: Unified, Real-Time Object Detection
- arxiv: http://arxiv.org/abs/1506.02640
- code: http://pjreddie.com/darknet/yolo/
- github: https://github.com/pjreddie/darknet
- reddit: https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/
- github: https://github.com/gliese581gg/YOLO_tensorflow
- github: https://github.com/xingwangsfu/caffe-yolo
- github: https://github.com/frankzhangrui/Darknet-Yolo
- github: https://github.com/BriSkyHekun/py-darknet-yolo
- github: https://github.com/tommy-qichang/yolo.torch
- github: https://github.com/frischzenger/yolo-windows
- gtihub: https://github.com/AlexeyAB/yolo-windows
darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++
- blog: https://thtrieu.github.io/notes/yolo-tensorflow-graph-buffer-cpp
- github: https://github.com/thtrieu/darkflow
Start Training YOLO with Our Own Data
- intro: train with customized data and class numbers/labels. Linux / Windows version for darknet.
- blog: http://guanghan.info/blog/en/my-works/train-yolo/
- github: https://github.com/Guanghan/darknet
R-CNN minus R
AttentionNet
AttentionNet: Aggregating Weak Directions for Accurate Object Detection
- intro: ICCV 2015
- intro: state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 human detection task
- arxiv: http://arxiv.org/abs/1506.07704
- slides: https://www.robots.ox.ac.uk/~vgg/rg/slides/AttentionNet.pdf
- slides: http://image-net.org/challenges/talks/lunit-kaist-slide.pdf
DenseBox
DenseBox: Unifying Landmark Localization with End to End Object Detection
- arxiv: http://arxiv.org/abs/1509.04874
- demo: http://pan.baidu.com/s/1mgoWWsS
- KITTI result: http://www.cvlibs.net/datasets/kitti/eval_object.php
SSD
SSD: Single Shot MultiBox Detector
- intro: ECCV 2016 Oral
- arxiv: http://arxiv.org/abs/1512.02325
- paper: http://www.cs.unc.edu/~wliu/papers/ssd.pdf
- slides: http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf
- github: https://github.com/weiliu89/caffe/tree/ssd
- video: http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973
- github(MXNet): https://github.com/zhreshold/mxnet-ssd
- github: https://github.com/zhreshold/mxnet-ssd.cpp
- github(Keras): https://github.com/rykov8/ssd_keras
Inside-Outside Net (ION)
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
- intro: “0.8s per image on a Titan X GPU (excluding proposal generation) without two-stage bounding-box regression and 1.15s per image with it”.
- arxiv: http://arxiv.org/abs/1512.04143
- slides: http://www.seanbell.ca/tmp/ion-coco-talk-bell2015.pdf
- coco-leaderboard: http://mscoco.org/dataset/#detections-leaderboard
Adaptive Object Detection Using Adjacency and Zoom Prediction
- intro: CVPR 2016. AZ-Net
- arxiv: http://arxiv.org/abs/1512.07711
- github: https://github.com/luyongxi/az-net
- youtube: https://www.youtube.com/watch?v=YmFtuNwxaNM
G-CNN
G-CNN: an Iterative Grid Based Object Detector
Factors in Finetuning Deep Model for object detection
Factors in Finetuning Deep Model for Object Detection with Long-tail Distribution
- intro: CVPR 2016.rank 3rd for provided data and 2nd for external data on ILSVRC 2015 object detection
- project page: http://www.ee.cuhk.edu.hk/~wlouyang/projects/ImageNetFactors/CVPR16.html
- arxiv: http://arxiv.org/abs/1601.05150
We don’t need no bounding-boxes: Training object class detectors using only human verification
HyperNet
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
MultiPathNet
A MultiPath Network for Object Detection
- intro: BMVC 2016. Facebook AI Research (FAIR)
- arxiv: http://arxiv.org/abs/1604.02135
- github: https://github.com/facebookresearch/multipathnet
CRAFT
CRAFT Objects from Images
- intro: CVPR 2016. Cascade Region-proposal-network And FasT-rcnn. an extension of Faster R-CNN
- project page: http://byangderek.github.io/projects/craft.html
- arxiv: https://arxiv.org/abs/1604.03239
- paper: http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yang_CRAFT_Objects_From_CVPR_2016_paper.pdf
- github: https://github.com/byangderek/CRAFT
OHEM
Training Region-based Object Detectors with Online Hard Example Mining
- intro: CVPR 2016 Oral. Online hard example mining (OHEM)
- arxiv: http://arxiv.org/abs/1604.03540
- paper: http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf
- github(Official): https://github.com/abhi2610/ohem
- author page: http://abhinav-shrivastava.info/
Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection
- intro: CVPR 2016
- arxiv: http://arxiv.org/abs/1604.05766
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
- intro: scale-dependent pooling (SDP), cascaded rejection clas-sifiers (CRC)
- paper: http://www-personal.umich.edu/~wgchoi/SDP-CRC_camready.pdf
R-FCN
R-FCN: Object Detection via Region-based Fully Convolutional Networks
- arxiv: http://arxiv.org/abs/1605.06409
- github: https://github.com/daijifeng001/R-FCN
- github: https://github.com/Orpine/py-R-FCN
Weakly supervised object detection using pseudo-strong labels
Recycle deep features for better object detection
MS-CNN
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
- intro: ECCV 2016
- intro: 640×480: 15 fps, 960×720: 8 fps
- arxiv: http://arxiv.org/abs/1607.07155
- github: https://github.com/zhaoweicai/mscnn
- poster: http://www.eccv2016.org/files/posters/P-2B-38.pdf
Multi-stage Object Detection with Group Recursive Learning
- intro: VOC2007: 78.6%, VOC2012: 74.9%
- arxiv: http://arxiv.org/abs/1608.05159
Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection
- intro: WACV 2017. SubCNN
- arxiv: http://arxiv.org/abs/1604.04693
- github: https://github.com/yuxng/SubCNN
PVANET
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
- intro: “less channels with more layers”, concatenated ReLU, Inception, and HyperNet, batch normalization, residual connections
- arxiv: http://arxiv.org/abs/1608.08021
- github: https://github.com/sanghoon/pva-faster-rcnn
- leaderboard(PVANet 9.0): http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4
PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
- intro: Presented at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN). Continuation of arXiv:1608.08021
- arxiv: https://arxiv.org/abs/1611.08588
GBD-Net
Gated Bi-directional CNN for Object Detection
- intro: The Chinese University of Hong Kong & Sensetime Group Limited
- paper: http://link.springer.com/chapter/10.1007/978-3-319-46478-7_22
- mirror: https://pan.baidu.com/s/1dFohO7v
Crafting GBD-Net for Object Detection
- intro: winner of the ImageNet object detection challenge of 2016. CUImage and CUVideo
- intro: gated bi-directional CNN (GBD-Net)
- arxiv: https://arxiv.org/abs/1610.02579
- github: https://github.com/craftGBD/craftGBD
StuffNet
StuffNet: Using ‘Stuff’ to Improve Object Detection
Generalized Haar Filter based Deep Networks for Real-Time Object Detection in Traffic Scene
Hierarchical Object Detection with Deep Reinforcement Learning
- intro: Deep Reinforcement Learning Workshop (NIPS 2016)
- project page: https://imatge-upc.github.io/detection-2016-nipsws/
- arxiv: https://arxiv.org/abs/1611.03718
- slides: http://www.slideshare.net/xavigiro/hierarchical-object-detection-with-deep-reinforcement-learning
- github: https://github.com/imatge-upc/detection-2016-nipsws
- blog: http://jorditorres.org/nips/
Learning to detect and localize many objects from few examples
Speed/accuracy trade-offs for modern convolutional object detectors
- intro: Google Research
- arxiv: https://arxiv.org/abs/1611.10012
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving
Feature Pyramid Network (FPN)
Feature Pyramid Networks for Object Detection
- intro: Facebook AI Research
- arxiv: https://arxiv.org/abs/1612.03144
Action-Driven Object Detection with Top-Down Visual Attentions
Beyond Skip Connections: Top-Down Modulation for Object Detection
- intro: CMU & UC Berkeley & Google Research
- arxiv: https://arxiv.org/abs/1612.06851
YOLOv2
YOLO9000: Better, Faster, Stronger
- arxiv: https://arxiv.org/abs/1612.08242
- code: http://pjreddie.com/yolo9000/
- github(Chainer): https://github.com/leetenki/YOLOv2
DSSD
DSSD : Deconvolutional Single Shot Detector
- intro: UNC Chapel Hill & Amazon Inc
- arxiv: https://arxiv.org/abs/1701.06659
Wide-Residual-Inception Networks for Real-time Object Detection
- intro: Inha University
- arxiv: https://arxiv.org/abs/1702.01243
Attentional Network for Visual Object Detection
- intro: University of Maryland & Mitsubishi Electric Research Laboratories
- arxiv: https://arxiv.org/abs/1702.01478
Detection From Video
Learning Object Class Detectors from Weakly Annotated Video
- intro: CVPR 2012
- paper: https://www.vision.ee.ethz.ch/publications/papers/proceedings/eth_biwi_00905.pdf
Analysing domain shift factors between videos and images for object detection
Video Object Recognition
Deep Learning for Saliency Prediction in Natural Video
- intro: Submitted on 12 Jan 2016
- keywords: Deep learning, saliency map, optical flow, convolution network, contrast features
- paper: https://hal.archives-ouvertes.fr/hal-01251614/document
T-CNN
T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos
- intro: Winning solution in ILSVRC2015 Object Detection from Video(VID) Task
- arxiv: http://arxiv.org/abs/1604.02532
- github: https://github.com/myfavouritekk/T-CNN
Object Detection from Video Tubelets with Convolutional Neural Networks
- intro: CVPR 2016 Spotlight paper
- arxiv: https://arxiv.org/abs/1604.04053
- paper: http://www.ee.cuhk.edu.hk/~wlouyang/Papers/KangVideoDet_CVPR16.pdf
- gihtub: https://github.com/myfavouritekk/vdetlib
Object Detection in Videos with Tubelets and Multi-context Cues
- intro: SenseTime Group
- slides: http://www.ee.cuhk.edu.hk/~xgwang/CUvideo.pdf
- slides: http://image-net.org/challenges/talks/Object%20Detection%20in%20Videos%20with%20Tubelets%20and%20Multi-context%20Cues%20-%20Final.pdf
Context Matters: Refining Object Detection in Video with Recurrent Neural Networks
- intro: BMVC 2016
- keywords: pseudo-labeler
- arxiv: http://arxiv.org/abs/1607.04648
- paper: http://vision.cornell.edu/se3/wp-content/uploads/2016/07/video_object_detection_BMVC.pdf
CNN Based Object Detection in Large Video Images
- intro: WangTao @ 爱奇艺
- keywords: object retrieval, object detection, scene classification
- slides: http://on-demand.gputechconf.com/gtc/2016/presentation/s6362-wang-tao-cnn-based-object-detection-large-video-images.pdf
Datasets
YouTube-Objects dataset v2.2
ILSVRC2015: Object detection from video (VID)
Object Detection in 3D
Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks
Object Detection on RGB-D
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Differential Geometry Boosts Convolutional Neural Networks for Object Detection
- intro: CVPR 2016
- paper: http://www.cv-foundation.org/openaccess/content_cvpr_2016_workshops/w23/html/Wang_Differential_Geometry_Boosts_CVPR_2016_paper.html
Salient Object Detection
This task involves predicting the salient regions of an image given by human eye fixations.
Best Deep Saliency Detection Models (CVPR 2016 & 2015)
http://i.cs.hku.hk/~yzyu/vision.html
Large-scale optimization of hierarchical features for saliency prediction in natural images
Predicting Eye Fixations using Convolutional Neural Networks
Saliency Detection by Multi-Context Deep Learning
DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection
SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection
Shallow and Deep Convolutional Networks for Saliency Prediction
Recurrent Attentional Networks for Saliency Detection
- intro: CVPR 2016. recurrent attentional convolutional-deconvolution network (RACDNN)
- arxiv: http://arxiv.org/abs/1604.03227
Two-Stream Convolutional Networks for Dynamic Saliency Prediction
Unconstrained Salient Object Detection
Unconstrained Salient Object Detection via Proposal Subset Optimization
- intro: CVPR 2016
- project page: http://cs-people.bu.edu/jmzhang/sod.html
- paper: http://cs-people.bu.edu/jmzhang/SOD/CVPR16SOD_camera_ready.pdf
- github: https://github.com/jimmie33/SOD
- caffe model zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-object-proposal-models-for-salient-object-detection
DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection
Salient Object Subitizing
- intro: CVPR 2015
- intro: predicting the existence and the number of salient objects in an image using holistic cues
- project page: http://cs-people.bu.edu/jmzhang/sos.html
- arxiv: http://arxiv.org/abs/1607.07525
- paper: http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf
- caffe model zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-models-for-salient-object-subitizing
Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection
- intro: ACMMM 2016. deeply-supervised recurrent convolutional neural network (DSRCNN)
- arxiv: http://arxiv.org/abs/1608.05177
Saliency Detection via Combining Region-Level and Pixel-Level Predictions with CNNs
- intro: ECCV 2016
- arxiv: http://arxiv.org/abs/1608.05186
Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection
A Deep Multi-Level Network for Saliency Prediction
Visual Saliency Detection Based on Multiscale Deep CNN Features
- intro: IEEE Transactions on Image Processing
- arxiv: http://arxiv.org/abs/1609.02077
A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection
- intro: DSCLRCN
- arxiv: https://arxiv.org/abs/1610.01708
Deeply supervised salient object detection with short connections
Weakly Supervised Top-down Salient Object Detection
- intro: Nanyang Technological University
- arxiv: https://arxiv.org/abs/1611.05345
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
- project page: https://imatge-upc.github.io/saliency-salgan-2017/
- arxiv: https://arxiv.org/abs/1701.01081
Visual Saliency Prediction Using a Mixture of Deep Neural Networks
A Fast and Compact Salient Score Regression Network Based on Fully Convolutional Network
Saliency Detection in Video
Deep Learning For Video Saliency Detection
Datasets
MSRA10K Salient Object Database
Specific Object Deteciton
Face Deteciton
Multi-view Face Detection Using Deep Convolutional Neural Networks
- intro: Yahoo
- arxiv: http://arxiv.org/abs/1502.02766
From Facial Parts Responses to Face Detection: A Deep Learning Approach
Compact Convolutional Neural Network Cascade for Face Detection
Face Detection with End-to-End Integration of a ConvNet and a 3D Model
- intro: ECCV 2016
- arxiv: https://arxiv.org/abs/1606.00850
- github(MXNet): https://github.com/tfwu/FaceDetection-ConvNet-3D
CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection
- intro: CMU
- arxiv: https://arxiv.org/abs/1606.05413
Finding Tiny Faces
- intro: CMU
- arxiv: https://arxiv.org/abs/1612.04402
Towards a Deep Learning Framework for Unconstrained Face Detection
- intro: overlap with CMS-RCNN
- arxiv: https://arxiv.org/abs/1612.05322
Supervised Transformer Network for Efficient Face Detection
UnitBox
UnitBox: An Advanced Object Detection Network
- intro: ACM MM 2016
- arxiv: http://arxiv.org/abs/1608.01471
Bootstrapping Face Detection with Hard Negative Examples
- author: 万韶华 @ 小米.
- intro: Faster R-CNN, hard negative mining. state-of-the-art on the FDDB dataset
- arxiv: http://arxiv.org/abs/1608.02236
Grid Loss: Detecting Occluded Faces
- intro: ECCV 2016
- arxiv: https://arxiv.org/abs/1609.00129
- paper: http://lrs.icg.tugraz.at/pubs/opitz_eccv_16.pdf
- poster: http://www.eccv2016.org/files/posters/P-2A-34.pdf
A Multi-Scale Cascade Fully Convolutional Network Face Detector
- intro: ICPR 2016
- arxiv: http://arxiv.org/abs/1609.03536
MTCNN
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks
- project page: https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html
- arxiv: https://arxiv.org/abs/1604.02878
- github(Matlab): https://github.com/kpzhang93/MTCNN_face_detection_alignment
- github(MXNet): https://github.com/pangyupo/mxnet_mtcnn_face_detection
- github: https://github.com/DaFuCoding/MTCNN_Caffe
- github(MXNet): https://github.com/Seanlinx/mtcnn
Face Detection using Deep Learning: An Improved Faster RCNN Approach
- intro: DeepIR Inc
- arxiv: https://arxiv.org/abs/1701.08289
Faceness-Net: Face Detection through Deep Facial Part Responses
- intro: An extended version of ICCV 2015 paper
- arxiv: https://arxiv.org/abs/1701.08393
Datasets / Benchmarks
FDDB: Face Detection Data Set and Benchmark
- homepage: http://vis-www.cs.umass.edu/fddb/index.html
- results: http://vis-www.cs.umass.edu/fddb/results.html
WIDER FACE: A Face Detection Benchmark
Facial Point / Landmark Detection
Deep Convolutional Network Cascade for Facial Point Detection
- homepage: http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm
- paper: http://www.ee.cuhk.edu.hk/~xgwang/papers/sunWTcvpr13.pdf
- github: https://github.com/luoyetx/deep-landmark
Facial Landmark Detection by Deep Multi-task Learning
- intro: ECCV 2014
- project page: http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html
- paper: http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepfacealign.pdf
- github(Matlab): https://github.com/zhzhanp/TCDCN-face-alignment
A Recurrent Encoder-Decoder Network for Sequential Face Alignment
- intro: ECCV 2016
- arxiv: https://arxiv.org/abs/1608.05477
Detecting facial landmarks in the video based on a hybrid framework
Deep Constrained Local Models for Facial Landmark Detection
Effective face landmark localization via single deep network
People Detection
End-to-end people detection in crowded scenes
- arxiv: http://arxiv.org/abs/1506.04878
- github: https://github.com/Russell91/reinspect
- ipn: http://nbviewer.ipython.org/github/Russell91/ReInspect/blob/master/evaluation_reinspect.ipynb
Detecting People in Artwork with CNNs
- intro: ECCV 2016 Workshops
- arxiv: https://arxiv.org/abs/1610.08871
Person Head Detection
Context-aware CNNs for person head detection
Pedestrian Detection
Pedestrian Detection aided by Deep Learning Semantic Tasks
- intro: CVPR 2015
- project page: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/
- paper: http://arxiv.org/abs/1412.0069
Deep Learning Strong Parts for Pedestrian Detection
- intro: ICCV 2015. CUHK. DeepParts
- intro: Achieving 11.89% average miss rate on Caltech Pedestrian Dataset
- paper: http://personal.ie.cuhk.edu.hk/~pluo/pdf/tianLWTiccv15.pdf
Deep convolutional neural networks for pedestrian detection
Scale-aware Fast R-CNN for Pedestrian Detection
New algorithm improves speed and accuracy of pedestrian detection
Pushing the Limits of Deep CNNs for Pedestrian Detection
- intro: “set a new record on the Caltech pedestrian dataset, lowering the log-average miss rate from 11.7% to 8.9%”
- arxiv: http://arxiv.org/abs/1603.04525
A Real-Time Deep Learning Pedestrian Detector for Robot Navigation
A Real-Time Pedestrian Detector using Deep Learning for Human-Aware Navigation
Is Faster R-CNN Doing Well for Pedestrian Detection?
- intro: ECCV 2016
- arxiv: http://arxiv.org/abs/1607.07032
- github: https://github.com/zhangliliang/RPN_BF/tree/RPN-pedestrian
Reduced Memory Region Based Deep Convolutional Neural Network Detection
- intro: IEEE 2016 ICCE-Berlin
- arxiv: http://arxiv.org/abs/1609.02500
Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection
Multispectral Deep Neural Networks for Pedestrian Detection
- intro: BMVC 2016 oral
- arxiv: https://arxiv.org/abs/1611.02644
Vehicle Detection
DAVE: A Unified Framework for Fast Vehicle Detection and Annotation
- intro: ECCV 2016
- arxiv: http://arxiv.org/abs/1607.04564
Evolving Boxes for fast Vehicle Detection
Traffic-Sign Detection
Traffic-Sign Detection and Classification in the Wild
- project page(code+dataset): http://cg.cs.tsinghua.edu.cn/traffic-sign/
- paper: http://120.52.73.11/www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhu_Traffic-Sign_Detection_and_CVPR_2016_paper.pdf
- code & model: http://cg.cs.tsinghua.edu.cn/traffic-sign/data_model_code/newdata0411.zip
Boundary / Edge / Contour Detection
Holistically-Nested Edge Detection
- intro: ICCV 2015, Marr Prize
- paper: http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Xie_Holistically-Nested_Edge_Detection_ICCV_2015_paper.pdf
- arxiv: http://arxiv.org/abs/1504.06375
- github: https://github.com/s9xie/hed
Unsupervised Learning of Edges
- intro: CVPR 2016. Facebook AI Research
- arxiv: http://arxiv.org/abs/1511.04166
- zn-blog: http://www.leiphone.com/news/201607/b1trsg9j6GSMnjOP.html
Pushing the Boundaries of Boundary Detection using Deep Learning
Convolutional Oriented Boundaries
- intro: ECCV 2016
- arxiv: http://arxiv.org/abs/1608.02755
Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks
- project page: http://www.vision.ee.ethz.ch/~cvlsegmentation/
- arxiv: https://arxiv.org/abs/1701.04658
Richer Convolutional Features for Edge Detection
- intro: richer convolutional features (RCF)
- arxiv: https://arxiv.org/abs/1612.02103
Skeleton Detection
Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs
DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images
Fruit Detection
Deep Fruit Detection in Orchards
Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards
- intro: The Journal of Field Robotics in May 2016
- project page: http://confluence.acfr.usyd.edu.au/display/AGPub/
- arxiv: https://arxiv.org/abs/1610.08120
Others
Deep Deformation Network for Object Landmark Localization
Fashion Landmark Detection in the Wild
Deep Learning for Fast and Accurate Fashion Item Detection
- intro: Kuznech Inc.
- intro: MultiBox and Fast R-CNN
- paper: https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Deep%20Learning%20for%20Fast%20and%20Accurate%20Fashion%20Item%20Detection.pdf
Visual Relationship Detection with Language Priors
- intro: ECCV 2016 oral
- paper: https://cs.stanford.edu/people/ranjaykrishna/vrd/vrd.pdf
- github: https://github.com/Prof-Lu-Cewu/Visual-Relationship-Detection
OSMDeepOD - OSM and Deep Learning based Object Detection from Aerial Imagery (formerly known as “OSM-Crosswalk-Detection”)
Selfie Detection by Synergy-Constraint Based Convolutional Neural Network
- intro: IEEE SITIS 2016
- arxiv: https://arxiv.org/abs/1611.04357
Associative Embedding:End-to-End Learning for Joint Detection and Grouping
Deep Cuboid Detection: Beyond 2D Bounding Boxes
- intro: CMU & Magic Leap
- arxiv: https://arxiv.org/abs/1611.10010
Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection
Deep Learning Logo Detection with Data Expansion by Synthesising Context
Pixel-wise Ear Detection with Convolutional Encoder-Decoder Networks
Object Proposal
DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers
Scale-aware Pixel-wise Object Proposal Networks
- intro: IEEE Transactions on Image Processing
- arxiv: http://arxiv.org/abs/1601.04798
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
- intro: BMVC 2016. AttractioNet
- arxiv: https://arxiv.org/abs/1606.04446
- github: https://github.com/gidariss/AttractioNet
Learning to Segment Object Proposals via Recursive Neural Networks
Localization
Beyond Bounding Boxes: Precise Localization of Objects in Images
- intro: PhD Thesis
- homepage: http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.html
- phd-thesis: http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.pdf
- github(“SDS using hypercolumns”): https://github.com/bharath272/sds
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
Weakly Supervised Object Localization Using Size Estimates
Active Object Localization with Deep Reinforcement Learning
- intro: ICCV 2015
- keywords: Markov Decision Process
- arxiv: https://arxiv.org/abs/1511.06015
Localizing objects using referring expressions
- intro: ECCV 2016
- keywords: LSTM, multiple instance learning (MIL)
- paper: http://www.umiacs.umd.edu/~varun/files/refexp-ECCV16.pdf
- github: https://github.com/varun-nagaraja/referring-expressions
LocNet: Improving Localization Accuracy for Object Detection
Learning Deep Features for Discriminative Localization
- homepage: http://cnnlocalization.csail.mit.edu/
- arxiv: http://arxiv.org/abs/1512.04150
- github(Tensorflow): https://github.com/jazzsaxmafia/Weakly_detector
- github: https://github.com/metalbubble/CAM
- github: https://github.com/tdeboissiere/VGG16CAM-keras
ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization
- intro: ECCV 2016
- project page: http://www.di.ens.fr/willow/research/contextlocnet/
- arxiv: http://arxiv.org/abs/1609.04331
- github: https://github.com/vadimkantorov/contextlocnet
Tutorials / Talks
Convolutional Feature Maps: Elements of efficient (and accurate) CNN-based object detection
Towards Good Practices for Recognition & Detection
- intro: Hikvision Research Institute. Supervised Data Augmentation (SDA)
- slides: http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf
Projects
TensorBox: a simple framework for training neural networks to detect objects in images
- intro: “The basic model implements the simple and robust GoogLeNet-OverFeat algorithm. We additionally provide an implementation of the ReInspect algorithm”
- github: https://github.com/Russell91/TensorBox
Object detection in torch: Implementation of some object detection frameworks in torch
Using DIGITS to train an Object Detection network
FCN-MultiBox Detector
- intro: Full convolution MultiBox Detector (like SSD) implemented in Torch.
- github: https://github.com/teaonly/FMD.torch
KittiBox: A car detection model implemented in Tensorflow.
- keywords: MultiNet
- intro: KittiBox is a collection of scripts to train out model FastBox on the Kitti Object Detection Dataset
- github: https://github.com/MarvinTeichmann/KittiBox
Blogs
Convolutional Neural Networks for Object Detection
http://rnd.azoft.com/convolutional-neural-networks-object-detection/
Introducing automatic object detection to visual search (Pinterest)
- keywords: Faster R-CNN
- blog: https://engineering.pinterest.com/blog/introducing-automatic-object-detection-visual-search
- demo: https://engineering.pinterest.com/sites/engineering/files/Visual%20Search%20V1%20-%20Video.mp4
- review: https://news.developer.nvidia.com/pinterest-introduces-the-future-of-visual-search/?mkt_tok=eyJpIjoiTnpaa01UWXpPRE0xTURFMiIsInQiOiJJRjcybjkwTmtmallORUhLOFFFODBDclFqUlB3SWlRVXJXb1MrQ013TDRIMGxLQWlBczFIeWg0TFRUdnN2UHY2ZWFiXC9QQVwvQzBHM3B0UzBZblpOSmUyU1FcLzNPWXI4cml2VERwTTJsOFwvOEk9In0%3D
Deep Learning for Object Detection with DIGITS
Analyzing The Papers Behind Facebook’s Computer Vision Approach
- keywords: DeepMask, SharpMask, MultiPathNet
- blog: https://adeshpande3.github.io/adeshpande3.github.io/Analyzing-the-Papers-Behind-Facebook’s-Computer-Vision-Approach/
Easily Create High Quality Object Detectors with Deep Learning
- intro: dlib v19.2
- blog: http://blog.dlib.net/2016/10/easily-create-high-quality-object.html
How to Train a Deep-Learned Object Detection Model in the Microsoft Cognitive Toolkit
- blog: https://blogs.technet.microsoft.com/machinelearning/2016/10/25/how-to-train-a-deep-learned-object-detection-model-in-cntk/
- github: https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Detection/FastRCNN
Object Detection in Satellite Imagery, a Low Overhead Approach
- part 1: https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-i-cbd96154a1b7#.2csh4iwx9
- part 2: https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-ii-893f40122f92#.f9b7dgf64
You Only Look Twice — Multi-Scale Object Detection in Satellite Imagery With Convolutional Neural Networks
- part 1: https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-38dad1cf7571#.fmmi2o3of
- part 2: https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-34f72f659588#.nwzarsz1t
Faster R-CNN Pedestrian and Car Detection
- blog: https://bigsnarf.wordpress.com/2016/11/07/faster-r-cnn-pedestrian-and-car-detection/
- ipn: https://gist.github.com/bigsnarfdude/2f7b2144065f6056892a98495644d3e0#file-demo_faster_rcnn_notebook-ipynb
- github: https://github.com/bigsnarfdude/Faster-RCNN_TF
Small U-Net for vehicle detection