ContextDesc implementation

TensorFlow implementation of ContextDesc for CVPR'19 paper (oral) "ContextDesc: Local Descriptor Augmentation with Cross-Modality Context", by Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang and Long Quan.

This paper focuses on augmenting off-the-shelf local feature descriptors with two types of context: the visual context from high-level image representation, and geometric context from keypoint distribution. If you find this project useful, please cite:

@article{luo2019contextdesc,
  title={ContextDesc: Local Descriptor Augmentation with Cross-Modality Context},
  author={Luo, Zixin and Shen, Tianwei and Zhou, Lei and Zhang, Jiahui and Yao, Yao and Li, Shiwei and Fang, Tian and Quan, Long},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Update 4/13/2020

The training code is released in a separate project, TFMatch, which also contains two related works (GeoDesc, CVPR'19 and ASLFeat, CVPR'20).

We release ContextDesc++_upright, which is trained with patch orientation fixed during training, i.e., without rotation perturbation. Such a model performs notably better when test images exhibit less rotation changes, or when keypoint orientation is used to normalize the input patches.

The effect of above modifications is summarized below. Recall on HPatches is reported.

Settings	ContextDesc++	ContextDesc++_upright
upright: false	74.54	72.20
upright: true	77.16	78.76

Requirements

Please use Python 3.6, install NumPy, OpenCV (3.4.2), OpenCV-Contrib (3.4.2) and TensorFlow (1.15.2). Refer to requirements.txt for some other dependencies.

Pre-trained model

We provide both the Protobuf files (for a quick start) and checkpoint files (for research purposes) for restoring the pre-trained weights, including

<contextdesc_model>
        ├── loc.pb (weights of the local feature model and matchability predictor)
        ├── aug.pb (weights of the augmentation model) 
        └── model.ckpt-* (checkpoint files for both the local feature model and agumentation model)

Several variants of ContextDesc as in the paper are provided for study.

Name	Downloads	Descriptions
Retrieval model	Link	(Regional feature) An image retrieval model trained on Google-Landmarks Dataset that provides high-level image representation to enrich visual context. More details can be found in the supplementary material.
DELF Retrieval model	Link	(Regional feature) DELF retrieval model for general purposes.
ContextDesc	Link	(Base) Use GeoDesc [1] (ECCV'18) as the local feature model, and train only the augmentation model.
ContextDesc+	Link	(Better) Train the local feature model and augmentation model separately with the proposed scale-aware N-pair loss.
ContextDesc++	Link	(Best) End-to-end train both the local feature and augmentation models.
ContextDesc++_upright	Link	(Post-CVPR update) End-to-end train both the local feature and augmentation models, with the patch orientation fixed (i.e., no perturbation and aligned to SIFT orientation) during training.
Dense-ContextDesc	Link	(Post-CVPR update) Densely extract features from the entire input image (instead of image patch). Details can be found here.

The TensorFlow network definition can be found here. An usage is provided along with the image matching example.

Training data

Training data is released in GL3D, and training code is available in TFMatch.

Example scripts

1. Test image matching

To get started, clone the repo and download the pretrained model (take ContextDesc++ as an example) URL:

git clone https://github.com/lzx551402/contextdesc.git && \
cd /local/contextdesc/pretrained && \
tar -xvf contextdesc_pp.tar

then simply call:

cd /local/contextdesc && python image_matching.py

The matching results from SIFT features (top), raw local features (middle) and augmented features (bottom) will be displayed.

To test the performance of a dense model, call the script with --dense_desc.
To use the TensorFlow checkpoint file for parameter restoring, call the script with --type ckpt.
Type python image_matching.py --h to view more options and test on your own images.

2. Evaluation on HPatches Sequences

First, download HPSequences (full image sequences of HPatches [3] and their corresponding homographies).

Second, download our CVPR intermediate results of keypoint locations and image patches for HPSequences (Link) (broken link, may need to regenerate the input).

Unzip the above two downloads in the same folder, and you will find each .ppm image aside with a .pkl file.

Finally, to reproduce our CVPR results, configure the data root in configs/hseq_eval.yaml, and call call the evaluation script by:

cd /local/contextdesc && python hseq_eval.py --function hseq_eval --config configs/hseq_eval.yaml

For ContextDesc++, you will see Recall of 67.35/77.33 for i/v sequences, similar to the results reported in the original paper (67.53/77.20).

The updated results can be obtained by setting suffix to null in configs/hseq_eval.yaml. Due to some tweakings on the keypoint detector and patch extractor, it yields better results, i.e., 70.10/78.83 for i/v sequences.

To test Dense-ContextDesc, set suffix to null and dense_desc to true in configs/hseq_eval.yaml, and it will give 76.95/75.58 for i/v sequences. It shows that a dense prediction (with a global input normalization) is benefical for illumination change. However, since scale/rotation changes are handled in the intermediate feature maps, the perspective invariance is weakened.

3. Benchmark on image matching

ContextDesc, together with a learned matcher [4], won both the stereo and muti-view image matching tracks at IMW2019. We provide the script that prepares the ContextDesc features and formats the submission files to this challenge.

To get started, follow the challenge instructions to download the test data.

Next, configure the data paths (data_root, dump_root and submission_root) in configs/imw2019_eval.yaml.

Then call the evaluation script by:

cd /local/contextdesc && python evaluations.py --config configs/imw2019_eval.yaml

You may then compress and submit the results to the challenge website.

4. Benchmark on visual localization

ContextDesc also achieved competitive results on visual localization benchmark. Please download Aachen Day-Night dataset and follow the evaluation instructions to prepare the evaluation data.

Next, configure the data paths (data_root and dump_root) in configs/aachen_eval.yaml

Then extract the features by:

cd /local/contextdesc && python evaluations.py --config configs/aachen_eval.yaml

The extracted features will be saved alongside their corresponding images, e.g., the features for image /local/Aachen_Day-Night/images/images_upright/db/1000.jpg will be in the file /local/Aachen_Day-Night/images/image_upright/db/1000.jpg.contextdesc10k_upright (the method name here is contextdesc10k_upright).

Finally, refer to the evaluation script to generate and submit the results to the challenge website.

Notes

Refer to example configuration file on how to evaluate with different settings.

References

[1] GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints, Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, Long Quan, ECCV 2018.

[2] Matchable Image Retrieval by Learning from Surface Reconstruction, Tianwei Shen*, Zixin Luo*, Lei Zhou, Runze Zhang, Siyu Zhu, Tian Fang, Long Quan, ACCV 2018.

[3] HPatches: A benchmark and evaluation of handcrafted and learned local descriptors, Vassileios Balntas*, Karel Lenc*, Andrea Vedaldi and Krystian Mikolajczyk, CVPR 2017.

[4] Learning Two-View Correspondences and Geometry Using Order-Aware Network, Jiahui Zhang*, Dawei Sun*, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao, ICCV 2019.

Changelog

2020-1-13

Add HPatches Sequences evaluation.

2019-7-18

Add TensorFlow network definition.

2019-7-17

A major code refactorying.
Add evaluation instructions on image matching and visual localization benchmark.
Add experimental Dense-ContextDesc model.

lzx551402/contextdesc

lzx551402

Reviews

Repository Details