• Stars
    star
    303
  • Rank 137,629 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 4 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Disk code release

DISK

Official code release for DISK: learning local features with policy gradient. If you use this code in your work, please cite us as

@article{tyszkiewicz2020disk,
  title={DISK: Learning local features with policy gradient},
  author={Tyszkiewicz, Micha{\l} and Fua, Pascal and Trulls, Eduard},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}

Table of contents

  1. Installation
  2. Inference
  3. Training
  4. Extending

Installation

  1. Clone this repo recursively
  2. cd into this repo: the next step uses relative paths
  3. Execute pip install --user -r requirements.txt

Inference

Feature extraction

To extract features, execute

python detect.py h5_artifacts_destination images_directory

This should create h5_artifacts_destination/keypoints.h5 and h5_artifacts_destination/descriptors.h5 compatible with the IMW benchmark. The model by default uses a 4-layer U-Net architecture which means that the image dimensions have to be a multiple of 16: for this reason you will probably want to specify the --height and --width flags to scale the input images accordingly. The images will be scaled preserving their aspect ratio (by 0-padding the missing values) and the keypoint locations will be rescaled and filtered with respect to the original image dimensions.

You can use --help to learn about other options, in particular it is possible to specify the weights file with --model-path. We provide save-depth.pth, the checkpoint trained with depth-based reward and reported in the paper (default), as well as save-epipolar.pth, the savepoint trained with epipolar reward and shown in supplementary material.

Keypoint matching

Execute

python match.py h5_artifacts_destination

(or use --help to learn about other options). This should create h5_artifacts_destination/matches.h5 incompatible with the IMW benchmark: instead of saving matches as {image_name_1}-{image_name_2}, it saves them as {image_name_1}/{image_name_2}, which creates HDF groups and therefore allows this approach to scale to large image collections (saving HDFs with > 300k top-level groups becomes painfully slow due to hashing overhead).

Viewing results

The view_h5.py script can be used to view artifacts generated by detect.py and match.py.

Exporting to COLMAP

After features are detected and matched, the results can be converted into a COLMAP-compatible database format with colmap/h5_to_db.py h5_artifacts_location raw_images_location. Note that the features are inserted WITHOUT their descriptors, so our match.py has to be used to perform the matching beforehand. At the same time, match.py doesn't run pose estimation, so the exhaustive feature matching stage of the COLMAP pipeline still has to be ran. An example pipeline use below:

# assume we have the images in scene/images
python detect.py --height 1024 --width 1024 --n 2048 scene/h5 scene/images
python match.py --rt 0.95 --save-threshold 100 scene/h5
python colmap/h5_to_db.py --database-path scene/database.db scene/h5 scene/images

# don't use GPU since we aren't computing the descriptor distance matrices anyway,
# only RANSAC
colmap exhaustive_matcher --database_path scene/database.db --SiftMatching.use_gpu 0
mkdir scene/sparse
colmap mapper --database_path scene/database.db --image_path scene/images --output_path scene/sparse

Please try h5_to_db.py --help for extra additional options.

Training

The training script

Assuming data is available, python train.py DATASETS_LOCATION starts training. The --reward switch allows for choosing the reward scheme (depth or epipolar). For more information, execute python train.py --help.

Reproducing our results

The data we used for training and validation can be downloaded by executing the download_dataset script (~164 gb). It will download the data into datasets.epfl.ch/disk-data/ and this is the path that should be given to train.py. The default settings of the script will learn with the inverse softmax matching temperature inverse_T (called θ_M in the paper) annealed from 15 to 50 over the course of first 20 epochs. We then pick the best checkpoint according to validation AUC, as reported by python compute_validation_auc.py TENSORBOARD_LOG_FILE. Following this schedule allowed us to obtain 0.50432 stereo AUC and 0.72624 multiview AUC on IMW2020 test set with 2k features, slightly less than reported in the paper (0.51315 and 0.72705, respectively).

The paper results (available as depth-save.pth, the default checkpoint in detect.py) were obtained through an ad-hoc schedule of annealing θ_M between 15 and 25 over 10 epochs and then training for further 40 epochs. We picked the best checkpoint obtained this way (39th) and fine-tuned it with a schedule of θ_M=25+epoch_number, for another 50 epochs, obtaining the best model at 20th epoch (θ_M=45). We default to the currently presented mode of training for simplicity, while disclosing the original process.

As people often request this, we have uploaded the cached results for the MMA metric on HPatches (Figure 5 in the NeurIPS paper) to this repository: they are available on the results/hpatches folder. You can read them with this notebook, similarly to the cached results provided by that repository.

Low GPU memory training

We performed our experiments with 32GB version of Nvidia V100 GPUs. However, running python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500 should be functionally equivalent with that setup and fit within 11/12gb GPUs (note that training in this mode may take on the order of 2 weeks!).

Custom data preparation

Alternatively, one can use a custom dataset laid out in the proper format, as explained more in depth here. We provide a script to automate that process in the case of photo collections posed with COLMAP.

Creating new datasets by importing from COLMAP

A new dataset (for instance with custom scenes) can be created by importing from COLMAP outputs. One should run COLMAP on the images, including steps of image rectification and patch match depth estimation. This should leave the user with a directory structured as

$ tree colmap_output
colmap_output/
├── images
│   ├── 2020_07_25__12_09_03.jpg
│   ├── 2020_07_25__12_09_05.jpg
│   ├── ...
├── run-colmap-geometric.sh
├── run-colmap-photometric.sh
├── sparse
│   ├── cameras.bin
│   ├── images.bin
│   └── points3D.bin
└── stereo
    ├── consistency_graphs
    ├── depth_maps
    │   ├── 2020_07_25__12_09_03.jpg.geometric.bin
    │   ├── 2020_07_25__12_09_03.jpg.photometric.bin
    │   ├── 2020_07_25__12_09_05.jpg.geometric.bin
    │   ├── 2020_07_25__12_09_05.jpg.photometric.bin
    │   ├── ...
    ├── fusion.cfg
    ├── normal_maps
    │   ├── ...
    └── patch-match.cfg

one can then execute python colmap/colmap2dataset.py colmap_output --name my_scene to create an extra "dataset" directory:

tree colmap_output/dataset/
├── calibration
│   ├── calibration_2020_07_25__12_09_03.jpg.h5
│   ├── calibration_2020_07_25__12_09_05.jpg.h5
│   ├── ..
├── dataset.json
└── depth
    ├── 2020_07_25__12_09_03.h5
    ├── 2020_07_25__12_09_05.h5
    ├── ..

The dataset.json is a file for instantiating DISK dataloaders and it contains a collection of absolute paths to contents of colmap_output/dataset and colmap_output/images, so those should not be moved afterwards. colmap_output/stereo and colmap_output/sparse can be safely deleted to conserve disk space.

In case one wants to merge multiple scenes into a single dataset, she can execute python colmap/merge_datasets.py my_scene_1/dataset/dataset.json my_scene_2/dataset/dataset.json ... in order to obtain a single file called merged.json which contains all the scenes (and still references the files in their original locations for each of the scenes!). Scenes with repeating names (as given by the --name flag of colmap2dataset) will be renamed to unique (but non-informative) names.

Extending

We tried to keep the code easy to understand and reasonably documented. Please open an issue if problems are encountered.

@dimchecked

We extensively use torch_dimcheck (the @dimchecked function decorator) for clarifying function signatures: please refer to the repository for extra information.

NpArray

We often deal with collections of tensors which are semantically batched but of different shapes (such as lists of features in different images of the same scene). Since PyTorch doesn't have the concept of jagged tensors, we wrap them with numpy arrays with dtype=object, rather than standard lists. This allows us to retain the reshaping, stackin and indexing functionality of those math libraries. In signatures, those are often annotated with the NpArray type annotation.

More Repositories

1

gaussian-splatting-web

TypeScript
536
star
2

LIFT

Code release for the ECCV 2016 paper
Python
485
star
3

EPnP

EPnP: Efficient Perspective-n-Point Camera Pose Estimation
MATLAB
263
star
4

MeshSDF

Code for "MeshSDF: Differentiable Iso-Surface Extraction", NeurIPS2020, SpotLight
Python
220
star
5

tf-lift

Tensorflow port of LIFT (ECCV 2016), with training code.
Python
196
star
6

segmentation-driven-pose

Segmentation-driven 6D Object Pose Estimation. CVPR 2019.
Python
184
star
7

MeshUDF

Fast and Differentiable Meshing of Unsigned Distance Field Networks
Cython
136
star
8

voxel2mesh

Voxel2Mesh: 3D Mesh Model Generation from Volumetric Data
Python
113
star
9

single-stage-pose

Single-Stage 6D Object Pose Estimation, CVPR 2020
Python
104
star
10

sketch2mesh

Reconstructing and Editing 3D Shapes from Sketches
Python
78
star
11

Power-Iteration-SVD

Backpropagation-Friendly-Eigendecomposition
Python
72
star
12

pyKSP

This is a Python wrapper for the K-Shortest Path tracking algorithm.
C++
66
star
13

social-scene-understanding

Source code for the CVPR 2017 paper
Python
63
star
14

wide-depth-range-pose

Wide-Depth-Range 6D Object Pose Estimation in Space, CVPR 2021
Python
61
star
15

log-polar-descriptors

Public implementation of "Beyond Cartesian Representations for Local Descriptors", ICCV 2019
Jupyter Notebook
60
star
16

detecting-the-unexpected

Detecting the Unexpected via Image Resynthesis
Python
56
star
17

balltracking

Tracking of the ball and the players in team sports
MATLAB
46
star
18

perspective-flow-aggregation

Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation. ECCV 2022.
Python
38
star
19

LabelGrab

Annotation tool for semantic and instance segmentation, with automated help from the GrabCut implemented in OpenCV.
Python
28
star
20

densecrf

a fork of the densecrf package implementing alternative inference scheme
C++
27
star
21

multiview_calib

Single and multiple view camera calibration tool
Jupyter Notebook
26
star
22

deepdesc-release

Code for the ICCV 2015 paper "Discriminative Learning of Deep Convolutional Feature Point Descriptors"
Lua
25
star
23

adv_param_pose_prior

Adversarial Parametric Pose Prior
Python
23
star
24

multicam-gt

Our Webapp to annotate multi-camera pedestrian detection datasets.
JavaScript
20
star
25

diff-nrsfm

MATLAB
18
star
26

cvlab-kubernetes-guide

Instructions and utilities for use of EPFL's compute cluster.
Python
15
star
27

MVFlow

Python
13
star
28

iter_unc

Official code for "Enabling Uncertainty Estimation in Iterative Neural Networks" (ICML 2024)
Jupyter Notebook
12
star
29

gecco

Code release for GECCO: Geometrically-Conditioned Point Diffusion Models
Python
11
star
30

zigzag

Official code for "ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference" (TMLR 2024)
Jupyter Notebook
10
star
31

PA-net

Probabilistic Atlases to Enforce Topological Constraints
Python
9
star
32

mot3d

Fast Single View and Multiview Multi Object Tracking Using Minimum Cost Maximum Flow Formulation
Jupyter Notebook
7
star
33

MVAug

Python
7
star
34

erasing-road-obstacles

Detecting Road Obstacles by Erasing Them
6
star
35

UCLID-Net

Implementation of UCLID-Net (NeurIPS 2020)
Python
5
star
36

n-queens-benchmark

C++
3
star
37

mf-mrf

Parallel mean-field inference web page
HTML
2
star
38

MARMOT

Multi-Aspect Reconstruction and Multi-Object Tracking
Jupyter Notebook
1
star
39

MAGE

Multi-Aspect Groundplane Estimation
Python
1
star
40

UDA-Hand-Object

Unsupervised Domain Adaptation with Temporal Consistency for 3D Joint Hand-Object Reconstruction
Python
1
star