SOLAR: Second-Order Loss and Attention for Image Retrieval

This repository contains the PyTorch implementation of our paper:

"SOLAR: Second-Order Loss and Attention for Image Retrieval"
Tony Ng, Vassileios Balntas, Yurun Tian, Krystian Mikolajczyk. ECCV 2020.
[arXiv] [short video] [long video] [ECCV Daily feature article] [OpenCV blog]

Before going further, please check out Filip Radenovic's great repository on image retrieval. Our solar-global module is heavily built upon it. If you use this code in your research, please also cite their work! [link to license]

Features

Complete test scripts for large-scale image retrieval with solar-global
Inference code for extracting local descriptors with solar-local
Second-order attention map visualisation for large images
Image matching visualisation
Training code for image retrieval

Requirements

Python 3
PyTorch tested on 1.3.0 - 1.5.1, torchvision 0.5+
opencv-python (cv2) tested on 3.3.0.10
TensorBoard tested on 2.0.0+
numpy
PIL
tqdm

Download model weights and descriptors

Begin with downloading our best models (both global and local) described in the paper, as well as the pre-computed descriptors of the 1M distractors set.

sh download.sh

The global model is saved at data/networks/resnet101-solar-best.pth and the local model at solar_local/weights/local-solar-345-liberty.pth. The descriptors of the 1M distractors are saved in the main directory (the file is quite big ~8GB, so it might take a while to download).

Testing our global descriptor

Here you can try out our pretrained model resnet101-solar-best.pth on the Revisiting Oxford and Paris dataset

Testing on R-Oxford, R-Paris

Once you've successfully downloaded the global model weights, run

python3 -m solar_global.examples.test

This script automatically downloads roxford5k,rparis6k into data/test/ and evaluates SOLAR on them. After a while, you should be able to get results as below

>> roxford5k: mAP E: 85.88, M: 69.9, H: 47.91
>> roxford5k: mP@k[1, 5, 10] E: [94.12 92.45 88.8 ], M: [94.29 90.86 86.71], H: [88.57 74.29 63.  ]

>> rparis6k: mAP E: 92.95, M: 81.57, H: 64.45
>> rparis6k: mP@k[1, 5, 10] E: [100.   96.57 95.43], M: [100.   98.   97.14], H: [97.14 94.57 93.  ]

Retrieval rankings are visualised in specs/ using

tensorboard --logdir specs/ --samples_per_plugin images=1000

You can view them on your browser at localhost:6006 in the IMAGES tab. Here's an example

You can also switch to the PROJECTOR tab and play around with TensorBoard's embedding visualisation tool. Here's an example of the 6322 database images in R-Paris, visualised with t-SNE

Testing with the extra 1-million distractors

If you decide to extract the descriptors on your own, run

(Note: this step takes a lot of time and storage, and we only provide it for verification. You can skip to the next command if you've already downloded the pre-computed descriptors from the previous step!)

python3 -m solar_global.examples.extract_1m

This script would download and extract the 1M distractors set and save them into data/test/revisitop1m/. This dataset is quite large (400GB+), so depending on your network & GPU, the whole process of downloading + extracting descriptors can take from a couple of days to a week. In our setting (~100MBps, V100), the download + extraction takes ~10 hours and the descriptors ~30 hours to be computed.

Now, make sure that resnet101-solar-best.pth_vecs_revisitop1m.pt is in the main directory, whether from the extraction step above or from the download ealier. Then you can run

python3 -m solar_global.examples.test_1m

and get results as below

>> roxford5k: mAP E: 72.04, M: 53.49, H: 29.89
>> roxford5k: mP@k[1, 5, 10] E: [88.24 81.99 76.96], M: [88.57 82.29 76.71], H: [74.29 58.29 48.86]

>> rparis6k: mAP E: 83.35, M: 59.19, H: 33.41
>> rparis6k: mP@k[1, 5, 10] E: [98.57 95.14 93.57], M: [98.57 96.29 94.86], H: [92.86 89.14 81.57]

Visualising second-order attention maps

Using our interactive visualisation tool

We provide a small demo for you to click around an image and interactively visualise the second-order attention (SOA) maps at different locations you select. (c.f. Section 4.3 in the paper for an in-depth analysis)

First, run

python3 -m demo.interactive_soa

This gorgeous image of the Eiffel Tower should pop up in a new window

Try drawing a (light green) rectangle centred at the location you would like to visualise the SOA map

A new window titled Second order attention with the SOA from the closest location in the feature map overlaid on the image, and a white dot indicating where you've selected should appear as below

Now, try drawing a rectangle in the sky, you should see the SOA more spread-out and silhouetting the main landmarks like this

You can keep clicking around the image to visualise more SOAs. Remember, the white dot in the SOA map tells you where the currently displayed attention map is selected from!

You can also try out different images by parsing the programme with

python3 -m demo.interactive_soa --image PATH/TO/YOUR/IMAGE

Jupyter-Notebook

Coming Soon!

Testing our local descriptor

Simple inference

We provide a bare-bones inference code for the local counterpart of SOLAR (Section 5.3 in the paper), so you can plug it into whatever applications you have for local descriptors.

To check that it works, run

python3 -m solar_local.example

If successful, it should display the following message

SOLAR_LOCAL - SOSNet w/ SOA layers:
SOA_3:
Num channels:    in   out   mid
                 64    64    16
SOA_4:
Num channels:    in   out   mid
                 64    64    16
SOA_5:
Num channels:    in   out   mid
                128   128    64
Descriptors shape torch.Size([512, 128])

Jupyter-Notebook

Follow our demo notebook to see a comparison between solar_local and the baseline SOSNet on an image-matching toy example.

Training SOLAR

Pre-processing the training set

As the GL18 dataset consists of only URLs, many of which have already expired, this part of the code lets you download the images we had at the time of training our models. However, this also means that extra storage space would be required for extracting tarballs, so please expect to have ~700GB upwards available. Otherwise, you could still download using GL18's downloader and save the images at data/train/gl18/jpg.

To download the images and pre-process them for training, simply run

sh gl18_preprocessing.sh

This would take sometime but you should then see around 1-million images in data/train/gl18/jpg and the pickle file data/train/gl18/db_gl18.pkl required for training.

If you downloaded the images from the URLs directly, please also make sure you download train.csv, boxes_split1.csv and boxes_split2.csv and save them into data/train/gl18. Then you can run

cd data/train/gl18 && python3 create_db_pickle.py

You should then see data/train/gl18/db_gl18.pkl successfully created.

Training

Once you've downloaded and pre-processed GL18, you can start the training with the settings described in the paper by running

python3 -m solar_global.examples.train specs/gl18 --training-dataset 'gl18' --test-datasets 'roxford5k,rparis6k' --arch 'resnet101' --pool 'gem' --p 3 --loss 'triplet' --pretrained-type 'gl18' --loss-margin 1.25 --optimizer 'adam' --lr 1e-6 -ld 1e-2 --neg-num 5 --query-size 2000 --pool-size 20000 --batch-size 8 --image-size 1024 --update-every 1 --whitening --soa --soa-layers '45' --sos --lambda 10 --no-val --print-freq 10 --flatten-desc

You can monitor the training losses and image pairs with tensorboard

tensorboard --logdir specs/

Citation

If you use this repository in your work, please cite our paper:

@inproceedings{ng2020solar,
    author    = {Ng, Tony and Balntas, Vassileios and Tian, Yurun and Mikolajczyk, Krystian},
    title     = {{SOLAR}: Second-Order Loss and Attention for Image Retrieval},
    booktitle = {ECCV},
    year      = {2020}
}

tonyngjichun/SOLAR

tonyngjichun

Reviews

Repository Details