• Stars
    star
    423
  • Rank 99,666 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hierarchical Object Detection with Deep Reinforcement Learning

Hierarchical Object Detection with Deep Reinforcement Learning

NIPS 2016 logo Paper accepted at Deep Reinforcement Learning Workshop, NIPS 2016
Míriam Bellver Xavier Giro-i-Nieto Ferran Marqués Jordi Torres
Míriam Bellver Xavier Giro-i-Nieto Ferran Marques Jordi Torres

A joint collaboration between:

logo-bsc logo-gpi
Barcelona Supercomputing Center UPC Image Processing Group

Summary

We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap.

Hierarchy of overlapping region proposals

Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal.

Architectures for convolutional feature extraction

Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.

Qualitative results

Publication

Our workshop paper is available on arXiv, and related slides here.

Please cite with the following Bibtex code:

@InProceedings{Bellver_2016_NIPSWS,
author = {Bellver, Miriam and Giro-i-Nieto, Xavier and Marques, Ferran and Torres, Jordi},
title = {Hierarchical Object Detection with Deep Reinforcement Learning},
booktitle = {Deep Reinforcement Learning Workshop, NIPS},
month = {December},
year = {2016}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection with Deep Reinforcement Learning." In Deep Reinforcement Learning Workshop (NIPS). 2016.

Code Instructions

This python code enables to both train and test each of the two models proposed in the paper. The image zooms model extracts features for each region visited, whereas the pool45 crops model extracts features just once and then ROI-pools features for each subregion. In this section we are going to describe how to use the code. The code uses Keras framework library. If you are using a virtual environment, you can use the requirements.txt provided.

First it is important to notice that this code is already an extension of the code used for the paper. During the training stage, we are not only considering one object per image, we are also training for other objects by covering the already found objects with the mean of VGG-16, inspired by what Caicedo et al. did on Active Object Localization with Deep Reinforcement Learning.

Setup

First of all the weights of VGG-16 should be downloaded from the following link VGG-16 weights. If you want to use some pre-trained models for the Deep Q-network, they can be downloaded in the following link Image Zooms model. Notice that these models could lead to different results compared to the ones provided in the paper, due that these models are already trained to find more than one instance of planes in the image. You should also create two folders in the root of the project, called models_image_zooms and models_pool45_crops, and store inside them the corresponding weights.

Usage

Training

We will follow as example how to train the Image Zooms model, that is the one that achieves better results. The instructions are equal for training the Pool45 Crops model. The script is image_zooms_training.py, and first the path to the database should be configured. The default paths are the following:

# path of PASCAL VOC 2012 or other database to use for training
path_voc = "./VOC2012/"
# path of other PASCAL VOC dataset, if you want to train with 2007 and 2012 train datasets
path_voc2 = "./VOC2007/"
# path of where to store the models
path_model = "../models_image_zooms"
# path of where to store visualizations of search sequences
path_testing_folder = '../testing_visualizations'
# path of VGG16 weights
path_vgg = "../vgg16_weights.h5"

But you can change them to point to your own locations.

The training of the models enables checkpointing, so you should indicate which epoch you are going to train when running the script. If you are training it from scratch, then the training command should be:

python image_zooms_training.py -n 0

There are many options that can be changed to test different configurations:

class_object: for which class you want to train the models. We have trained it for planes, and all the experiments of the paper are run on this class, but you can test other categories of pascal, also changing appropiately the training databases.

number_of_steps: For how many steps you want your agent to search for an object in an image.

scale_subregion: The scale of the subregions in the hierarchy, compared to its ancestor. Default value is 3/4, that denoted good results in our experiments, but it can easily be set. Take into consideration that the subregion scale and the number of steps is very correlated, if the subregion scale is high, then you will probably require more steps to find objects.

bool_draw: This is a boolean, that if it is set to 1, it stores visualizations of the sequences for image searches.

At each epoch the models will be saved in the models_image_zooms folder.

Testing

To test the models, you should use the script image_zooms_testing.py. You should also configure the paths to indicate which weights you want to use, in the same manner as in the training stage. In this case, you should only run the command python image_zooms_testing.py. It is recommended that for testing you put bool_draw = 1, so you can observe the visualizations of the object search sequences. There is the option to just search for a single object in each image, to reproduce the same results of our paper, by just setting the boolean only_first_object to 1.

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at the UPC. We also would like to thank Carlos Tripiana from the technical support team at the Barcelona Supercomputing center (BSC).

AlbertGil-photo JosepPujal-photo CarlosTripiana-photo
Albert Gil Josep Pujal Carlos Tripiana
This work has been supported by the grant SEV2015-0493 of the Severo Ochoa Program awarded by Spanish Government, project TIN2015-65316 by the Spanish Ministry of Science and Innovation contracts 2014-SGR-1051 by Generalitat de Catalunya logo-severo
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan Z and Titan X used in this work at the UPC, and the BSC/UPC NVIDIA GPU Center of Excellence. logo-nvidia
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
This work has been developed in the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:[email protected] and mailto:[email protected].

More Repositories

1

salgan

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Python
368
star
2

rvos

RVOS: End-to-End Recurrent Network for Video Object Segmentation (CVPR 2019)
Python
278
star
3

retrieval-2017-cam

Class-Weighted Convolutional Features for Image Retrieval (BMVC 2017)
Python
223
star
4

retrieval-2016-deepvision

Faster R-CNN features for Instance Search
Python
216
star
5

activitynet-2016-cvprw

Tools to participate in the ActivityNet Challenge 2016 (NIPSW 2016)
Jupyter Notebook
195
star
6

saliency-2016-cvpr

Shallow and Deep Convolutional Networks for Saliency Prediction
Python
185
star
7

3D-GAN-superresolution

3D super-resolution using Generative Adversarial Networks
Python
155
star
8

rsis

Recurrent Neural Networks for Semantic Instance Segmentation
Jupyter Notebook
132
star
9

skiprnn-2017-telecombcn

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks (ICLR 2018)
Python
124
star
10

retrieval-2016-icmr

Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016)
Python
111
star
11

sentiment-2017-imavis

From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction
Python
97
star
12

liverseg-2017-nipsws

Detection-aided Liver Lesion Segmentation
Python
96
star
13

readai

READAI - Artificial Intelligence Reading Group
79
star
14

salbow

Saliency Weighted Convolutional Features for Instance Search
Python
55
star
15

deep-stereo

Use deep learning to estimate virtual views
Python
52
star
16

mri-braintumor-segmentation

MRI Brain Tumor Segmentation - BraTS Challenge 2020
Python
46
star
17

vqa-2016-cvprw

Visual question answering for CVPR16 VQA Challenge.
Python
41
star
18

pathgan

PathGan: Visual Scan-path Prediction with Generative Adversarial Networks
Python
41
star
19

danifojo-2018-repeatrnn

Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks
Python
34
star
20

slt_how2sign_wicv2023

Sign Language Translation for Instructional Videos - CVPR WiCV 2023
Python
32
star
21

Action-Tubelet-Detection-in-AVA

Python
28
star
22

telecombcn-2016-dlcv

Summer Seminar ETSETB TelecomBCN, 4-8 July 2016
HTML
27
star
23

unsupervised-2017-cvprw

Disentangling Motion, Foreground and Background Features in Videos
Python
25
star
24

multimodal-registration

Software for performing registration of 2D images and 3D point clouds
C++
22
star
25

sentiment-2015-asm

Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction
Python
21
star
26

SurvLIMEpy

Local interpretability for survival models
Python
17
star
27

retrieval-2016-remote

Multi-Label Remote Sensing Image Retrieval By Using Deep Features
14
star
28

segmentation_DLMI

Python
12
star
29

saliency-2018-videosalgan

Temporal Regularization of Saliency Maps in Egocentric Videos
Python
10
star
30

sign-topic

Topic Detection in Continuous Sign Language Videos. Presented as an extended abstract to the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop.
Python
7
star
31

rvos-mots

Curriculum Learning for Recurrent Video Object Segmentation
Python
6
star
32

egocentric-2016-saliency

Research on the prediction of visual saliency in egocentric vision.
Python
6
star
33

affective-2017-musa2

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
Python
5
star
34

ragc

RAGC: Residual Attention Graph Convolutional Network for Geometric 3D Scene Classification (ICCVW 2019)
Python
5
star
35

egocentric-2017-lta

Semantic Summarization of Egocentric Photo Stream Events
OpenEdge ABL
5
star
36

videolabeler

Python
4
star
37

PiCoEDL

Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates
HTML
4
star
38

skinningnet

CSS
3
star
39

speech2signs-2017-nmt

Neural Machine Translation based on the Pytorch “Attention is all you need” Implementation
Python
3
star
40

AI4Agriculture-grape-detection

Jupyter Notebook
3
star
41

munegc

Python
3
star
42

icv-3d-vision

Several tools and demonstrations for the 3D vision part of the Introduction to Computer Vision Course
MATLAB
3
star
43

saliency-2018-timeweight

The Importance of Time in Visual Attention Models
Jupyter Notebook
3
star
44

trecvid-2015

Tools for running the TRECVID 2015 Instance Search task at the Technical University of Catalonia.
Python
3
star
45

VNeAT

VNeAT (Voxel-wise Neuroimaging Analysis Toolbox) is a command-line toolbox written in Python that provides the tools to analyze the linear and nonlinear dynamics of a particular tissue and study the statistical significance of such dynamics at the voxel level.
Python
3
star
46

structnet_aging

Jupyter Notebook
2
star
47

memory-2016-fpv

Tools for lifelogging image processing.
JavaScript
2
star
48

signalign

Tools for temporal and spatial aligning between cameras in our sign language video datasets.
Jupyter Notebook
2
star
49

SurvLIME-experiments

Jupyter Notebook
2
star
50

synthref

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
2
star
51

retrieval-2016-lostobject

Time-sensitive Egocentric Image Retrieval for Fidings Objects in Lifelogs.
2
star
52

progressive_nns

Python
1
star
53

saliency-2016-lsun

Visual saliency predictor used to participate in the LSUN Challenge 2016.
Jupyter Notebook
1
star
54

netbenchmark

R
1
star
55

Maxtree-Processing-Toolbox

C
1
star
56

gesture-sound

This is the TFG of Efrem Blazquez: L’Art del so en Mans de la Imatge. Els Sensors 3D i les seves Possibilitats
C++
1
star
57

image-synthesis

Synthesis of prostate MRI or biomarkers to improve the detection/classification of clinically significant prostate cancer
Python
1
star
58

BCN20000

Python
1
star