• Stars
    star
    615
  • Rank 70,400 (Top 2 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created almost 5 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

End-to-end learning of deep visual representations for image retrieval

Deep Image Retrieval

This repository contains the models and the evaluation scripts (in Python3 and Pytorch 1.0+) of the papers:

[1] End-to-end Learning of Deep Visual Representations for Image Retrieval Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus, IJCV 2017 [PDF]

[2] Learning with Average Precision: Training Image Retrieval with a Listwise Loss Jerome Revaud, Jon Almazan, Rafael S. Rezende, Cesar de Souza, ICCV 2019 [PDF]

Both papers tackle the problem of image retrieval and explore different ways to learn deep visual representations for this task. In both cases, a CNN is used to extract a feature map that is aggregated into a compact, fixed-length representation by a global-aggregation layer*. Finally, this representation is first projected using a FC layer, and L2 normalized so images can be efficiently compared with the dot product.

dir_network

All components in this network, including the aggregation layer, are differentiable, which makes it end-to-end trainable for the end task. In [1], a Siamese architecture that combines three streams with a triplet loss was proposed to train this network. In [2], this work was extended by replacing the triplet loss with a new loss that directly optimizes for Average Precision.

Losses

* Originally, [1] used R-MAC pooling [3] as the global-aggregation layer. However, due to its efficiency and better performace we have replaced the R-MAC pooling layer with the Generalized-mean pooling layer (GeM) proposed in [4]. You can find the original implementation of [1] in Caffe following this link.

News

  • (6/9/2019) AP loss, Tie-aware AP loss, Triplet Margin loss, and Triplet LogExp loss added for reference
  • (5/9/2019) Update evaluation and AP numbers for all the benchmarks
  • (22/7/2019) Paper Learning with Average Precision: Training Image Retrieval with a Listwise Loss accepted at ICCV 2019

Pre-requisites

In order to run this toolbox you will need:

  • Python3 (tested with Python 3.7.3)
  • PyTorch (tested with version 1.4)
  • The following packages: numpy, matplotlib, tqdm, scikit-learn

With conda you can run the following commands:

conda install numpy matplotlib tqdm scikit-learn
conda install pytorch torchvision -c pytorch

Installation

# Download the code
git clone https://github.com/naver/deep-image-retrieval.git

# Create env variables
cd deep-image-retrieval
export DIR_ROOT=$PWD
export DB_ROOT=/PATH/TO/YOUR/DATASETS
# for example: export DB_ROOT=$PWD/dirtorch/data/datasets

Evaluation

Pre-trained models

The table below contains the pre-trained models that we provide with this library, together with their mAP performance on some of the most well-know image retrieval benchmakrs: Oxford5K, Paris6K, and their Revisited versions (ROxford5K and RParis6K).

Model Oxford5K Paris6K ROxford5K (med/hard) RParis6K (med/hard)
Resnet101-TL-MAC 85.6 90.1 63.3 / 35.7 76.6 / 55.5
Resnet101-TL-GeM 85.7 93.4 64.5 / 40.9 78.8 / 59.2
Resnet50-AP-GeM 87.7 91.9 65.5 / 41.0 77.6 / 57.1
Resnet101-AP-GeM 89.1 93.0 67.1 / 42.3 80.3/60.9
Resnet101-AP-GeM-LM18** 88.1 93.1 66.3 / 42.5 80.2 / 60.8

The name of the model encodes the backbone architecture of the network and the loss that has been used to train it (TL for triplet loss and AP for Average Precision loss). All models use Generalized-mean pooling (GeM) [3] as the global pooling mechanism, except for the model in the first row that uses MAC [3] (i.e. max-pooling), and have been trained on the Landmarks-clean [1] dataset (the clean version of the Landmarks dataset) directly fine-tuning from ImageNet. These numbers have been obtained using a single resolution and applying whitening to the output features (which has also been learned on Landmarks-clean). For a detailed explanation of all the hyper-parameters see [1] and [2] for the triplet loss and AP loss models, respectively.

** For the sake of completeness, we have added an extra model, Resnet101-AP-LM18, which has been trained on the Google-Landmarks Dataset, a large dataset consisting of more than 1M images and 15K classes.

Reproducing the results

The script test_dir.py can be used to evaluate the pre-trained models provided and to reproduce the results above:

python -m dirtorch.test_dir --dataset DATASET --checkpoint PATH_TO_MODEL \
		[--whiten DATASET] [--whitenp POWER] [--aqe ALPHA-QEXP] \
		[--trfs TRANSFORMS] [--gpu ID] [...]
  • --dataset: selects the dataset (eg.: Oxford5K, Paris6K, ROxford5K, RParis6K) [required]
  • --checkpoint: path to the model weights [required]
  • --whiten: applies whitening to the output features [default 'Landmarks_clean']
  • --whitenp: whitening power [default: 0.25]
  • --aqe: alpha-query expansion parameters [default: None]
  • --trfs: input image transformations (can be used to apply multi-scale) [default: None]
  • --gpu: selects the GPU ID (-1 selects the CPU)

For example, to reproduce the results of the Resnet101-AP_loss model on the RParis6K dataset download the model Resnet-101-AP-GeM.pt from here and run:

cd $DIR_ROOT
export DB_ROOT=/PATH/TO/YOUR/DATASETS

python -m dirtorch.test_dir --dataset RParis6K \
		--checkpoint dirtorch/data/Resnet101-AP-GeM.pt \
		--whiten Landmarks_clean --whitenp 0.25 --gpu 0

And you should see the following output:

>> Evaluation...
 * mAP-easy = 0.907568
 * mAP-medium = 0.803098
 * mAP-hard = 0.608556

Note: this script integrates an automatic downloader for the Oxford5K, Paris6K, ROxford5K, and RParis6K datasets (kudos to Filip Radenovic ;)). The datasets will be saved in $DB_ROOT.

Feature extractor

You can also use the pre-trained models to extract features from your own datasets or collection of images. For that we provide the script feature_extractor.py:

python -m dirtorch.extract_features --dataset DATASET --checkpoint PATH_TO_MODEL \
		--output PATH_TO_FILE [--whiten DATASET] [--whitenp POWER] \
		[--trfs TRANSFORMS] [--gpu ID] [...]

where --output is used to specify the destination where the features will be saved. The rest of the parameters are the same as seen above.

For example, this is how the script can be used to extract a feature representation for each one of the images in the RParis6K dataset using the Resnet-101-AP-GeM.pt model, and storing them in rparis6k_features.npy:

cd $DIR_ROOT
export DB_ROOT=/PATH/TO/YOUR/DATASETS

python -m dirtorch.extract_features --dataset RParis6K \
		--checkpoint dirtorch/data/Resnet101-AP-GeM.pt \
		--output rparis6k_features.npy \
		--whiten Landmarks_clean --whitenp 0.25 --gpu 0

The library also provides a generic class dataset (ImageList) that allows you to specify the list of images by providing a simple text file.

--dataset 'ImageList("PATH_TO_TEXTFILE" [, "IMAGES_ROOT"])'

Each row of the text file should contain a single path to a given image:

/PATH/TO/YOUR/DATASET/images/image1.jpg
/PATH/TO/YOUR/DATASET/images/image2.jpg
/PATH/TO/YOUR/DATASET/images/image3.jpg
/PATH/TO/YOUR/DATASET/images/image4.jpg
/PATH/TO/YOUR/DATASET/images/image5.jpg

Alternatively, you can also use relative paths, and use IMAGES_ROOT to specify the root folder.

Feature extraction with kapture datasets

Kapture is a pivot file format, based on text and binary files, used to describe SFM (Structure From Motion) and more generally sensor-acquired data.

It is available at https://github.com/naver/kapture. It contains conversion tools for popular formats and several popular datasets are directly available in kapture.

It can be installed with:

pip install kapture

Datasets can be downloaded with:

kapture_download_dataset.py update
kapture_download_dataset.py list
# e.g.: install mapping and query of Extended-CMU-Seasons_slice22
kapture_download_dataset.py install "Extended-CMU-Seasons_slice22_*"

If you want to convert your own dataset into kapture, please find some examples here.

Once installed, you can extract global features for your kapture dataset with:

cd $DIR_ROOT
python -m dirtorch.extract_kapture --kapture-root pathto/yourkapturedataset --checkpoint dirtorch/data/Resnet101-AP-GeM-LM18.pt --gpu 0

Run python -m dirtorch.extract_kapture --help for more information on the extraction parameters.

Citations

Please consider citing the following papers in your publications if this helps your research.

@article{GARL17,
 title = {End-to-end Learning of Deep Visual Representations for Image Retrieval},
 author = {Gordo, A. and Almazan, J. and Revaud, J. and Larlus, D.}
 journal = {IJCV},
 year = {2017}
}

@inproceedings{RARS19,
 title = {Learning with Average Precision: Training Image Retrieval with a Listwise Loss},
 author = {Revaud, J. and Almazan, J. and Rezende, R.S. and de Souza, C.R.}
 booktitle = {ICCV},
 year = {2019}
}

Contributors

This library has been developed by Jerome Revaud, Rafael de Rezende, Cesar de Souza, Diane Larlus, and Jon Almazan at Naver Labs Europe.

Special thanks to Filip Radenovic. In this library, we have used the ROxford5K and RParis6K downloader from his awesome CNN-imageretrieval repository. Consider checking it out if you want to train your own models for image retrieval!

References

[1] Gordo, A., Almazan, J., Revaud, J., Larlus, D., End-to-end Learning of Deep Visual Representations for Image Retrieval. IJCV 2017

[2] Revaud, J., Almazan, J., Rezende, R.S., de Souza, C., Learning with Average Precision: Training Image Retrieval with a Listwise Loss. ICCV 2019

[3] Tolias, G., Sicre, R., Jegou, H., Particular object retrieval with integral max-pooling of CNN activations. ICLR 2016

[4] Radenovic, F., Tolias, G., Chum, O., Fine-tuning CNN Image Retrieval with No Human Annotation. TPAMI 2018

More Repositories

1

billboard.js

πŸ“Š Re-usable, easy interface JavaScript chart library based on D3.js
TypeScript
5,723
star
2

fe-news

FE 기술 μ†Œμ‹ νλ ˆμ΄μ…˜ λ‰΄μŠ€λ ˆν„°
5,274
star
3

dust3r

DUSt3R: Geometric 3D Vision Made Easy
Python
3,409
star
4

egjs-flicking

🎠 ♻️ Everyday 30 million people experience. It's reliable, flexible and extendable carousel.
TypeScript
2,551
star
5

egjs-infinitegrid

A module used to arrange card elements including content infinitely on a grid layout.
TypeScript
1,869
star
6

ngrinder

enterprise level performance testing solution
Java
1,788
star
7

d2codingfont

D2 Coding κΈ€κΌ΄
1,774
star
8

egjs

Javascript components group that brings easiest and fastest way to build a web application in your way.
JavaScript
922
star
9

biobert-pretrained

BioBERT: a pre-trained biomedical language representation model for biomedical text mining
632
star
10

sqlova

Python
625
star
11

splade

SPLADE: sparse neural search (SIGIR21, SIGIR22)
Python
618
star
12

r2d2

Python
442
star
13

fixture-monkey

Let Fixture Monkey generate test instances including edge cases automatically
Java
440
star
14

egjs-view360

360 integrated viewing solution
TypeScript
438
star
15

kapture

kapture is a file format as well as a set of tools for manipulating datasets, and in particular Visual Localization and Structure from Motion data.
Python
429
star
16

scavenger

A runtime dead code analysis tool
Java
383
star
17

yobi

Project hosting software - Deprecated
Java
379
star
18

roma

RoMa: A lightweight library to deal with 3D rotations in PyTorch.
Python
364
star
19

lispe

An implementation of a full fledged Lisp interpreter with Data Structure, Pattern Programming and High level Functions with Lazy Evaluation Γ  la Haskell.
C
357
star
20

lucy-xss-filter

HTML
319
star
21

arcus

ARCUS is the NAVER memcached with lists, sets, maps and b+trees. http://naver.github.io/arcus
Shell
300
star
22

spring-jdbc-plus

Spring JDBC Plus
Java
257
star
23

egjs-grid

A component that can arrange items according to the type of grids
TypeScript
253
star
24

kapture-localization

Provide mapping and localization pipelines based on kapture format
Python
251
star
25

android-imagecropview

android image crop library
Java
250
star
26

smarteditor2

Javascript WYSIWYG HTML editor
JavaScript
241
star
27

lucy-xss-servlet-filter

Java
237
star
28

claf

CLaF: Open-Source Clova Language Framework
Python
215
star
29

eslint-config-naver

Naver JavaScript Coding Conventions rules for eslint
JavaScript
205
star
30

kor2vec

OOV없이 λΉ λ₯΄κ³  μ •ν™•ν•œ ν•œκ΅­μ–΄ Embedding 라이브러리
Python
197
star
31

tamgu

Tamgu (탐ꡬ), a FIL programming language: Functional, Imperative, Logical all in one for annotation and data augmentation
C++
186
star
32

nlp-challenge

NLP Shared tasks (NER, SRL) using NSML
Python
176
star
33

nbase-arc

nbase-arc is an open source distributed memory store based on Redis
C
171
star
34

nanumfont

170
star
35

egjs-view3d

Fast & customizable 3D model viewer for everyone
TypeScript
170
star
36

hackday-conventions-java

캠퍼슀 핡데이 Java μ½”λ”© μ»¨λ²€μ…˜
169
star
37

egjs-axes

A module used to change the information of user action entered by various input devices such as touch screen or mouse into the logical virtual coordinates.
TypeScript
150
star
38

cgd

Combination of Multiple Global Descriptors for Image Retrieval
Python
144
star
39

croco

Python
137
star
40

volley-extensions

Volley Extensions v2.0.0. ( Volleyer, Volley requests, Volley caches, Volley custom views )
Java
134
star
41

naver-openapi-guide

CSS
129
star
42

tldr

TLDR is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self-supervised learning losses
Python
120
star
43

fire

Python
119
star
44

grabcutios

Image segmentation using GrabCut algorithm for iOS
C++
118
star
45

sling

C++
117
star
46

gdc

Code accompanying our papers on the "Generative Distributional Control" framework
Python
116
star
47

naveridlogin-sdk-android

넀이버 μ•„μ΄λ””λ‘œ 둜그인 SDK (μ•ˆλ“œλ‘œμ΄λ“œ)
Kotlin
112
star
48

PoseGPT

Python
106
star
49

egjs-conveyer

Conveyer adds Drag gestures to your Native Scroll.
TypeScript
103
star
50

egjs-agent

Extracts browser and operating system information from the user agent string or user agent object(userAgentData).
TypeScript
100
star
51

spring-batch-plus

Add useful features to spring batch
Kotlin
100
star
52

cfcs

Write once, create framework components that supports React, Vue, Svelte, and more.
TypeScript
98
star
53

searchad-apidoc

Java
96
star
54

dope

Python
91
star
55

multi-hmr

Pytorch demo code and models for Multi-HMR
Python
87
star
56

imagestabilizer

C++
77
star
57

posescript

Python
76
star
58

guitar

AutoIt
76
star
59

arcus-memcached

ARCUS memory cache server
C
69
star
60

disco

A Toolkit for Distributional Control of Generative Models
Python
68
star
61

svc

Easy and intuitive pattern for Android
Kotlin
63
star
62

cover-checker

Check your pull request code coverage
Java
63
star
63

storybook-addon-preview

Storybook Addon Preview can show user selected knobs in various framework code in Storybook
TypeScript
63
star
64

egjs-list-differ

βž•βž–πŸ”„ A module that checks the diff when values are added, removed, or changed in an array.
TypeScript
61
star
65

egjs-imready

I'm Ready to check if the images or videos are loaded!
TypeScript
59
star
66

egjs-flicking-plugins

Plugins for @egjs/flicking
TypeScript
59
star
67

naveridlogin-sdk-ios

Objective-C
58
star
68

clova-face-kit

On-device lightweight face recognition. Available on Android, iOS, WASM, Python.
57
star
69

prism-live-studio

C++
56
star
70

rye

RYE, Native Sharding RDBMS
C
54
star
71

hubblemon

Python
54
star
72

zeplin-flutter-gen

πŸš€The Flutter dart code generator from zeplin. ex) Container, Text, Color, TextStyle, ... - Save your time.
JavaScript
54
star
73

egjs-visible

A class that checks if an element is visible in the base element or viewport.
HTML
52
star
74

aqm-plus

PyTorch code for Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation (AQM+) (ICLR 2019)
Python
50
star
75

arcus-java-client

ARCUS Java client
Java
49
star
76

isometrizer

Isometrizer turns your DOM elements into isometric projection
TypeScript
47
star
77

garnet

Python
45
star
78

jindojs-jindo

Jindo JavaScript Framework
JavaScript
44
star
79

artemis

Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Python
42
star
80

covid19-nmt

Multi-lingual & multi-domain (specialisation for biomedical data) translation model
Python
40
star
81

react-sample-code

이 ν”„λ‘œμ νŠΈλŠ” hello world에 κ³΅κ°œν•œ React 개발 κ°€μ΄λ“œμ— ν•„μš”ν•œ μƒ˜ν”Œ μ½”λ“œμž…λ‹ˆλ‹€.
JavaScript
39
star
82

passport-naver

A passport strategy for Naver OAuth 2.0
JavaScript
38
star
83

hadoop

Public hadoop release repository
Java
38
star
84

kaist-oss-course

Introduction to Open Source Software class @ KAIST 2016
38
star
85

pump

Python
38
star
86

egjs-component

A class used to manage events in a component like DOM
TypeScript
38
star
87

graphql-dataloader-mongoose

graphql-dataloader-mongoose is a DataLoader generator based on an existing Mongoose model
TypeScript
38
star
88

egjs-persist

Provide cache interface to handle persisted data among history navigation.
JavaScript
38
star
89

hammer.js

A javascript library for multi-touch gestures :// You can touch this
JavaScript
37
star
90

posebert

Python
37
star
91

naverspeech-sdk-ios

Swift
32
star
92

reflect

C++ class reflection library without RTTI.
C++
32
star
93

android-utilset

Utilset is collections of useful functions to save your valuable time.
Java
32
star
94

cafe-sdk-unity

31
star
95

naver-spring-batch-ex

Java
31
star
96

image-maps

jquery plugin which can be partially linked to the image
JavaScript
31
star
97

whale-browser-developers

Documents for Whale browser developers.
28
star
98

ai-hackathon

넀이버 AI Hackathon_AI Vision!
Python
28
star
99

image-sprite-webpack-plugin

A webpack plugin that generates spritesheets from your stylesheets.
JavaScript
28
star
100

oasis

Code for the paper "On the Road to Online Adaptation for Semantic Image Segmentation", CVPR 2022
Python
27
star