• Stars
    star
    111
  • Rank 306,151 (Top 7 %)
  • Language
    Python
  • Created almost 8 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016)

Bags of Local Convolutional Features for Scalable Instance Search

Logo Icmr Best Poster Award at the ACM International Conference on Multimedia Retrieval (ICMR) 2016
Eva Mohedano Amaia Salvador Kevin McGuinness Xavier Giro-i-Nieto Noel O'Connor Ferran Marques
Eva Mohedano Amaia Salvador Kevin McGuinness Xavier Giro-i-Nieto Noel O'Connor Ferran Marques

A joint collaboration between:

logo-insight logo-dcu logo-upc logo-etsetb logo-gpi
Insight Centre for Data Analytics Dublin City University (DCU) Universitat Politecnica de Catalunya (UPC) UPC ETSETB TelecomBCN UPC Image Processing Group

Abstract

This work proposes a simple instance retrieval pipeline based on encoding the convolutional features of CNN using the bag of words aggregation scheme (BoW). Assigning each local array of activations in a convolutional layer to a visual word produces an assignment map, a compact representation that relates regions of an image with a visual word. We use the assignment map for fast spatial reranking, obtaining object localizations that are used for query expansion. We demonstrate the suitability of the BoW representation based on local CNN features for instance retrieval, achieving competitive performance on the Oxford and Paris buildings benchmarks. We show that our proposed system for CNN feature aggregation with BoW outperforms state-of-the-art techniques using sum pooling at a subset of the challenging TRECVid INS benchmark.

Publication

Find our paper at ACM Digital Library, arXiv and DCU Doras.

Image of the paper

Please cite with the following Bibtex code:

@inproceedings{Mohedano:2016:BLC:2911996.2912061,
 author = {Mohedano, Eva and McGuinness, Kevin and O'Connor, Noel E. and Salvador, Amaia and Marques, Ferran and Giro-i-Nieto, Xavier},
 title = {Bags of Local Convolutional Features for Scalable Instance Search},
 booktitle = {Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval},
 series = {ICMR '16},
 year = {2016},
 isbn = {978-1-4503-4359-6},
 location = {New York, New York, USA},
 pages = {327--331},
 numpages = {5},
 url = {http://doi.acm.org/10.1145/2911996.2912061},
 doi = {10.1145/2911996.2912061},
 acmid = {2912061},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {bag of words, convolutional neural networks, instance retrieval},
} 

Best Poster Award at ICMR 2016

https://github.com/imatge-upc/retrieval-2016-icmr/raw/master/docs/icmr2016-poster.pdf

Poster presenters

Talk on video

<iframe src="https://player.vimeo.com/video/165478041" width="640" height="480" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

2016-05-Seminar-AmaiaSalvador-DeepVision from Image Processing Group on Vimeo.

This talk also covers our paper "Faster R-CNN features for Instance Search" at CVPR 2016 Workshop on DeepVision.

Slides

<iframe src="//www.slideshare.net/slideshow/embed_code/key/lZzb4HdY6OEZ01" width="595" height="485" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>

These slides also cover our paper "Faster R-CNN features for Instance Search" at CVPR 2016 Workshop on DeepVision.

Code Instructions

Description

It contains scripts to build Bag of Visual Words based on local CNN features to perform instance search in three different datasets:

  • Oxford Buildings (and Oxford 105k).

  • Paris Buildings (and Paris 106k).

  • Trecvid_subset: Subset of 23.614 keyframes/~13.000video shots from TRECVid-INS dataset. Keyframes extracted uniformly at 1/4fps. Queries and groundtruth correspond to INS2013.

Prerequisits

Python packages necessary specified in requirements.txt run:

 pip install -r requirements.txt

It also needs:

  • caffe with python support

  • vlfeat library

    • Once installed, modify 'kmeans.py' file, located in the vlfeat python package: (i.e /usr/local/lib/python2.7/dist-packages/vlfeat_ctypes-0.1.4-py2.7.egg/vlfeat/kmeans.py) for lib/kmeans.py of this repo.
  • invidx module. Follow instructions in 'lib/py=inverted-index'/

NOTE You can create a virtual enviroment to set the specific dependences of this project in an independent enviroment, without modifying your original python enviroment. Check how to create a virtual enviroment. Using --system-site-packages flag when creating the venv will copy all the packages from your python installation. This will prevent you of re-installing packages in the new virtual enviroment that you have already installed. This will allow you to install only the new packages.

How to run it?

bow_pipeline folder contain the main scripts. Parameters must be specified in the settings.json file located in the folder named as the dataset.

Step 1: FEATURE EXTRACTION (bow_pipeline/A_feature_extraction.py)

This script extracts features from a pre-trained CNN (by default the fully convolutional VGG-16 network. It computers descriptors from the specified layer/s Layer_output parameter in a levelDB dataset in featuresDB paramenter in the settings.json, with the format [featuresDB]/[layer]_db.

NOTE Features are stored as the original dictionary created by the Net class from caffe. For reading, you should use the class Local_Feature_ReaderDB located in bow_pipeline/reader.py. This class contains methods to extract local features in the format (n_samples, n_dimensions)/image for BoW encoding. It also performs SumPooling, generating (1, n_dimensions)/image. It also contains methods to interpolate feature maps when reading. For more info in, check bow_pipeline/reader.py script.

Step 2: BUILDING VISUAL VOCABULARY (bow_pipeline/B_processing_clustering.py)

It performs the clustering of the local features. It is necessary to set the following parameters:

TRAIN_PCA=True -- If we want PCA/whitenning (default true)
TRAIN_CENTROIDS=True -- if we want to train centroids (default true)
l2norm=True -- if we want to perform l2-norm on the features (default true)
n_centers=25000 -- # of clusters
pca_dim=512 -- dimensions when doing PCA

It is necessary to specify the settings.json (different for each dataset, which contains paths for reading features/store models.

Step 3: ASSIGNMENTS AND INVERTED FILE GENERATION (bow_pipeline/C_bow_representation.py)

Once the visual vocabulary is build, we can compute the assignments based on the local features/image and we can index the dataset of images. It is necessary to set the following parameters (check the script):

settings file for the dataset
dim_input -- network input dimensions in string format
network="vgg16" -- string with network name to use
list_layers -- list with layer/s to use
new_dim -- tuple with the feature map dimension

Step 4: RANKINGS GENERATION (bow_pipeline/D_rankings_BoW.py)

It uses the inverted file generated and generates the rankigns for the queries. It computes the assignments for the query on-the-fly. It is necessary to set parameters as in Step 3, with the additional:

  	- masking = 3 (Kind of masking applied on query):
        0== No maks;
        1 == Only consider words from foreground;
        2 == Only consider words from background [CHANGED];
        3 == Apply inverse weight to the foreground object
  	- augmentation = [0] [TO REVIEW]
        0 == No augmentation;
        1 == 0+Flipped image;
        2 == 0+Zoomed image (same size as input net, but just capturing center crop
             of the image when it has been zoomed to the double size).
        3 == 0+Flipped zoomed crop
    - QUERY_EXPANSION [TO_ADD]

NOTE If using bow_pipeline/D_rankings_Pooling.py; Skip steps 2 and 3. The whole pipeline is based in sumpooled features (without inverted index and codebook generation).

.txt files are generated per query under datasetFolder/lists_[bow/pooling].

Step 5: EVALUATION (bow_pipeline/evaluate_[Oxf_Par/trecvid].py)

It computes mean average precission for rankings generated in Step 4.

  • evaluate_Oxf_Par for Paris and Oxford datasets

  • evaluate_trecvid for TRECVid subset.

NOTE It generate map.txt in bow_pipeline folder with results.

More Repositories

1

detection-2016-nipsws

Hierarchical Object Detection with Deep Reinforcement Learning
Python
423
star
2

salgan

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Python
368
star
3

rvos

RVOS: End-to-End Recurrent Network for Video Object Segmentation (CVPR 2019)
Python
278
star
4

retrieval-2017-cam

Class-Weighted Convolutional Features for Image Retrieval (BMVC 2017)
Python
223
star
5

retrieval-2016-deepvision

Faster R-CNN features for Instance Search
Python
216
star
6

activitynet-2016-cvprw

Tools to participate in the ActivityNet Challenge 2016 (NIPSW 2016)
Jupyter Notebook
195
star
7

saliency-2016-cvpr

Shallow and Deep Convolutional Networks for Saliency Prediction
Python
185
star
8

3D-GAN-superresolution

3D super-resolution using Generative Adversarial Networks
Python
155
star
9

rsis

Recurrent Neural Networks for Semantic Instance Segmentation
Jupyter Notebook
132
star
10

skiprnn-2017-telecombcn

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks (ICLR 2018)
Python
124
star
11

sentiment-2017-imavis

From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction
Python
97
star
12

liverseg-2017-nipsws

Detection-aided Liver Lesion Segmentation
Python
96
star
13

readai

READAI - Artificial Intelligence Reading Group
79
star
14

salbow

Saliency Weighted Convolutional Features for Instance Search
Python
55
star
15

deep-stereo

Use deep learning to estimate virtual views
Python
52
star
16

mri-braintumor-segmentation

MRI Brain Tumor Segmentation - BraTS Challenge 2020
Python
46
star
17

vqa-2016-cvprw

Visual question answering for CVPR16 VQA Challenge.
Python
41
star
18

pathgan

PathGan: Visual Scan-path Prediction with Generative Adversarial Networks
Python
41
star
19

danifojo-2018-repeatrnn

Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks
Python
34
star
20

slt_how2sign_wicv2023

Sign Language Translation for Instructional Videos - CVPR WiCV 2023
Python
32
star
21

Action-Tubelet-Detection-in-AVA

Python
28
star
22

telecombcn-2016-dlcv

Summer Seminar ETSETB TelecomBCN, 4-8 July 2016
HTML
27
star
23

unsupervised-2017-cvprw

Disentangling Motion, Foreground and Background Features in Videos
Python
25
star
24

multimodal-registration

Software for performing registration of 2D images and 3D point clouds
C++
22
star
25

sentiment-2015-asm

Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction
Python
21
star
26

SurvLIMEpy

Local interpretability for survival models
Python
17
star
27

retrieval-2016-remote

Multi-Label Remote Sensing Image Retrieval By Using Deep Features
14
star
28

segmentation_DLMI

Python
12
star
29

saliency-2018-videosalgan

Temporal Regularization of Saliency Maps in Egocentric Videos
Python
10
star
30

sign-topic

Topic Detection in Continuous Sign Language Videos. Presented as an extended abstract to the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop.
Python
7
star
31

rvos-mots

Curriculum Learning for Recurrent Video Object Segmentation
Python
6
star
32

egocentric-2016-saliency

Research on the prediction of visual saliency in egocentric vision.
Python
6
star
33

affective-2017-musa2

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
Python
5
star
34

ragc

RAGC: Residual Attention Graph Convolutional Network for Geometric 3D Scene Classification (ICCVW 2019)
Python
5
star
35

egocentric-2017-lta

Semantic Summarization of Egocentric Photo Stream Events
OpenEdge ABL
5
star
36

videolabeler

Python
4
star
37

PiCoEDL

Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates
HTML
4
star
38

skinningnet

CSS
3
star
39

speech2signs-2017-nmt

Neural Machine Translation based on the Pytorch “Attention is all you need” Implementation
Python
3
star
40

AI4Agriculture-grape-detection

Jupyter Notebook
3
star
41

munegc

Python
3
star
42

icv-3d-vision

Several tools and demonstrations for the 3D vision part of the Introduction to Computer Vision Course
MATLAB
3
star
43

saliency-2018-timeweight

The Importance of Time in Visual Attention Models
Jupyter Notebook
3
star
44

trecvid-2015

Tools for running the TRECVID 2015 Instance Search task at the Technical University of Catalonia.
Python
3
star
45

VNeAT

VNeAT (Voxel-wise Neuroimaging Analysis Toolbox) is a command-line toolbox written in Python that provides the tools to analyze the linear and nonlinear dynamics of a particular tissue and study the statistical significance of such dynamics at the voxel level.
Python
3
star
46

structnet_aging

Jupyter Notebook
2
star
47

memory-2016-fpv

Tools for lifelogging image processing.
JavaScript
2
star
48

signalign

Tools for temporal and spatial aligning between cameras in our sign language video datasets.
Jupyter Notebook
2
star
49

SurvLIME-experiments

Jupyter Notebook
2
star
50

synthref

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
2
star
51

retrieval-2016-lostobject

Time-sensitive Egocentric Image Retrieval for Fidings Objects in Lifelogs.
2
star
52

progressive_nns

Python
1
star
53

saliency-2016-lsun

Visual saliency predictor used to participate in the LSUN Challenge 2016.
Jupyter Notebook
1
star
54

netbenchmark

R
1
star
55

Maxtree-Processing-Toolbox

C
1
star
56

gesture-sound

This is the TFG of Efrem Blazquez: L’Art del so en Mans de la Imatge. Els Sensors 3D i les seves Possibilitats
C++
1
star
57

image-synthesis

Synthesis of prostate MRI or biomarkers to improve the detection/classification of clinically significant prostate cancer
Python
1
star
58

BCN20000

Python
1
star