• Stars
    star
    280
  • Rank 147,492 (Top 3 %)
  • Language
    C++
  • License
    Other
  • Created about 8 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

fast library for ANN search and KNN graph construction

EFANNA: an Extremely Fast Approximate Nearest Neighbor search Algorithm framework based on kNN graph

EFANNA is a flexible and efficient library for approximate nearest neighbor search (ANN search) on large scale data. It implements the algorithms of our paper EFANNA : Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph.
EFANNA provides fast solutions on both approximate nearest neighbor graph construction and ANN search problems. EFANNA is also flexible to adopt all kinds of hierarchical structure for initialization, such as random projection tree, hierarchical clustering tree, multi-table hashing and so on.

What's new

  • Please see our more advanced search algorithm NSG Jan 13, 2018
  • The paper updated significantly. Dec 6, 2016
  • Algorithm improved and AVX instructions supported. Nov 30, 2016
  • Parallelism with OpenMP. Sep 26, 2016

Benchmark data set

ANN search performance

The performance was tested without parallelism.

SIFT1nn
SIFT100nn
GIST1nn
GIST100nn
Compared Algorithms:

  • kGraph
  • flann
  • IEH : Fast and accurate hashing via iterative nearest neighbors expansion
  • GNNS : Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph

kNN Graph Construction Performance

The performance was tested without parallelism.

SIFT1nnGraph
SIFT100nnGraph

Compared Algorithms:

  • Kgraph (same with NN-descent)
  • NN-expansion (same with GNNS)
  • SGraph : Scalable k-NN graph construction for visual descriptors
  • FastKNN : Fast kNN Graph Construction with Locality Sensitive Hashing
  • LargeVis : Visualizing Large-scale and High-dimensional Data

How To Complie

Go to the root directory of EFANNA and make.

cd efanna/
make

How To Use

EFANNA uses a composite index to carry out ANN search, which includes an approximate kNN graph and a number of tree structures. They can be built by this library as a whole or seperately.

You may build the kNN graph seperately for other use, like other graph based machine learning algorithms.

Below are some demos.

  • kNN graph building :

      cd efanna/samples/   
      ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 8 30 25 10 10   
    

Meaning of the parameters(from left to right):

sift_base.fvecs -- database points  
sift.graph -- graph built by EFANNA   

8 -- number of trees used to build the graph (larger is more accurate but slower)   
8 -- conquer-to-depeth(smaller is more accurate but slower)   
8 -- number of iterations to build the graph 
 
30 -- L (larger is more accurate but slower, no smaller than K)  
25 -- check (larger is more accurate but slower, no smaller than K)  
10 -- K, for KNN graph   
10 -- S (larger is more accurate but slower)
  • tree building :

      cd efanna/samples/
      ./efanna_index_buildtrees sift_base.fvecs sift.trees 32
    

    Meaning of the parameters(from left to right):

    sift_base.fvecs -- database points
    sift.trees -- struncated KD-trees built by EFANNA
    32 -- number of trees to build

  • index building at one time:

      cd efanna/samples/
      ./efanna_index_buildall sift_base.fvecs sift.graph sift.trees 32 8 8 200 200 100 10 8
    

    Meaning of the parameters(from left to right)

    sift_base.fvecs -- database points
    sift.trees -- struncated KD-trees built by EFANNA
    sift.graph -- approximate KNN graph built by EFANNA 32 -- number of trees in total for building index
    8 -- conquer-to-depth
    8 -- iteration number
    200 -- L (larger is more accurate but slower, no smaller than K)
    200 -- check (larger is more accurate but slower, no smaller than K)
    100 -- K, for KNN graph 10 -- S (larger is more accurate but slower)
    8 -- 8 out of 32 trees are used for building graph

  • ANN search

      cd efanna/samples/
      ./efanna_search sift_base.fvecs sift.trees sift.graph sift_query.fvecs sift.results 16 4 1200 200 10
    

    Meaning of the parameters(from left to right):

    sift_base.fvecs -- database points
    sift.trees -- prebuilt struncated KD-trees used for search
    sift.graph -- prebuilt kNN graph
    sift_query -- sift query points
    sift.results -- path to save ANN search results of given query
    16 -- number of trees to use (no greater than the number of prebuilt trees)
    4 -- number of epoches
    1200 -- pool size factor (larger is more accurate but slower, usually 6~10 times larger than extend factor)
    200 -- extend factor (larger is more accurate but slower)
    10 -- required number of returned neighbors (i.e. k of k-NN)

  • Evaluation

      cd efanna/samples/
      ./evaluate sift.results sift_groundtruth.ivecs 10
    

    Meaning of the parameters(from left to right):

    sift.results -- search results file
    sift_groundtruth.ivecs -- ground truth file
    10 -- evaluate the 10NN accuracy (the only first 10 points returned by the algorithm are examined, how many points are among the true 10 nearest neighbors of the query)

See our paper or user manual for more details about the parameters and interfaces.

Output format

The file format of approximate kNN graph and ANN search results are the same.
Suppose the database has N points, and numbered from 0 to N-1. You want to build an approximate kNN graph. The graph can be regarded as a N * k Matrix. The saved kNN graph binary file saves the matrix by row. The first 4 bytes of each row saves the int value of k, next 4 bytes saves the value of M and next 4 bytes saves the float value of the norm of the point. Then it follows k*4 bytes, saving the indices of the k nearest neighbors of respective point. The N rows are saved continuously without seperating characters.

Similarly, suppose the query data has n points, numbered 0 to n-1. You want EFANNA to return k nearest neighbors for each query. The result file will save n rows like the graph file. It saves the returned indices row by row. Each row starts with 4 bytes recording value of k, and follows k*4 bytes recording neighbors' indices.

Input of EFANNA

Because there is no unified format for input data, users may need to write input function to read your own data. You may imitate the input function in our sample code (sample/efanna_efanna_index_buildgraph.cc) to load the data into our matrix.

To use SIMD instruction optimization, you should pay attention to the data alignment problem of SSE / AVX instruction.

Compare with EFANNA without parallelism and SSE/AVX instructions

To disable the parallelism, there is no need to modify the code. Simply

    export OMP_NUM_THREADS=1

before you run the code. Then the code will only use one thread. This is a very convenient way to control the number of threads used.

To disable SSE/AVX instructions, you need to modify samples/xxxx.cc, find the line

    FIndex<float> index(dataset, new L2DistanceAVX<float>(), efanna::KDTreeUbIndexParams(true, trees ,mlevel ,epochs,checkK,L, kNN, trees, S));

Change L2DistanceAVX to L2Distance and build the project. Now the SSE/AVX instructions are disabled. If you want to try SSE instead of AVX, try L2DistanceSSE

Parameters to get the Fig. 4/5 (10-NN approximate graph construction) in our paper

SIFT1M:

    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 0 20 10 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 1 20 10 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 2 20 10 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 3 20 10 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 5 20 10 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 6 20 20 10 10
    ./efanna_index_buildgraph sift_base.fvecs sift.graph 8 8 6 20 30 10 10

GIST1M:

    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 2 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 3 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 4 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 5 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 6 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 7 30 30 10 10
    ./efanna_index_buildgraph gist_base.fvecs gist.graph 8 8 10 30 40 10 10

Acknowledgment

Our code framework imitates Flann to make it scalable, and the implemnetation of NN-descent is taken from Kgraph. They proposed the NN-descent algorithm. Many thanks to them for inspiration.

What to do

  • Add more initial algorithm choice

More Repositories

1

pixel_link

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018
Python
767
star
2

nsg

Navigating Spreading-out Graph For Approximate Nearest Neighbor Search
C++
584
star
3

MatlabFunc

Matlab codes for feature learning
MATLAB
502
star
4

ttfnet

Python
481
star
5

RMI

This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.
Python
268
star
6

resa

Implementation of our paper 'RESA: Recurrent Feature-Shift Aggregator for Lane Detection' in AAAI2021.
Python
175
star
7

time_lstm

Python
152
star
8

MaxSquareLoss

Code for "Domain Adaptation for Semantic Segmentation with Maximum Squares Loss" in PyTorch.
Python
109
star
9

SSG

code for satellite system graphs
C++
95
star
10

efanna_graph

an Extremely Fast Approximate Nearest Neighbor graph construction Algorithm framework
C++
79
star
11

graph_level_drug_discovery

Python
60
star
12

CariFaceParsing

Code for ICIP2019 paper:Weakly-supervised Caricature Face Parsing through Domain Adaptation
Python
55
star
13

AtSNE

Anchor-t-SNE for large-scale and high-dimension vector visualization
Cuda
54
star
14

depthInpainting

Depth Image Inpainting with Low Gradient Regularization
C++
50
star
15

ALDA

Code for "Adversarial-Learned Loss for Domain Adaptation"(AAAI2020) in PyTorch.
Python
49
star
16

AttentionZSL

Codes for Paper "Attribute Attention for Semantic Disambiguation in Zero-Shot Learning"
Python
44
star
17

ReDR

Code for ACL 2019 paper "Reinforced Dynamic Reasoning for Conversational Question Generation".
Python
41
star
18

hashingSearch

Search with a hash index
C++
31
star
19

SRDet

A simple, fast, efficient and end-to-end 3D object detector without NMS.
Python
30
star
20

PTL

Progressive Transfer Learning for Person Re-identification published on IJCAI-2019
Python
26
star
21

TreeAttention

A Better Way to Attend: Attention with Trees for Video Question Answering
Python
25
star
22

RPLSH

Kmeans Quantization + Random Projection based Locality Sensitive Hashing
C++
23
star
23

videoqa

Unifying the Video and Question Attentions for Open-Ended Video Question Answering
Python
21
star
24

DMP

Code for ACL 2018 paper "Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference".
Python
17
star
25

DREN

DREN:Deep Rotation Equivirant Network
C++
15
star
26

Attention-GRU-3M

Python
12
star
27

AMI

Python
7
star
28

Sparse-Learning-with-Stochastic-Composite-Optimization

The implementation of our work "Sparse Learning with Stochastic Composite Optimization"
MATLAB
7
star
29

TransAt

Python
6
star
30

diverse_image_synthesis

PyTorch implementation of diverse conditional image synthesis
Python
4
star
31

DeAda

Decouple Co-adaptation: Classifier Randomization for Person Re-identification published on Neurocomputing.
Python
3
star
32

AdaDB

Python
2
star
33

SIF

SIF: Self-Inspirited Feature Learning for Person Re-Identification published on IEEE TIP
Python
2
star
34

SIFS

C++
1
star
35

SplitNet

Jupyter Notebook
1
star