• Stars
    star
    1,253
  • Rank 37,509 (Top 0.8 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data

Neighborhood Graph and Tree for Indexing High-dimensional Data

Home / Installation / Command / License / Publications / About Us / 日本語

NGT provides commands and a library for performing high-speed approximate nearest neighbor searches against a large volume of data in high dimensional vector data space (several ten to several thousand dimensions).

News

  • 08/10/2022 QBG (Quantized Blob Graph) and QG (renewed NGTQG) are now available. The command-line interface ngtq and ngtqg are now obsolete by replacing qbg. (v2.0.0)
  • 02/04/2022 FP16 (half-precision floating point) is now available. (v1.14.0)
  • 03/12/2021 The results for the quantized graph are added to this README.
  • 01/15/2021 NGT v1.13.0 to provide the quantized graph (NGTQG) is released.
  • 11/04/2019 NGT tutorial has been released.
  • 06/26/2019 Jaccard distance is available. (v1.7.6)
  • 06/10/2019 PyPI NGT package v1.7.5 is now available.
  • 01/17/2019 Python NGT can be installed via pip from PyPI. (v1.5.1)
  • 12/14/2018 NGTQ (NGT with Quantization) is now available. (v1.5.0)
  • 08/08/2018 ONNG is now available. (v1.4.0)

Methods

This repository provides the following methods.

  • NGT: Graph and tree-based method
  • QG: Quantized graph-based method
  • QBG: Quantized blob graph-based method

Note: Since QG and QBG require BLAS and LAPACK libraries, if you use only NGT (Graph and tree-based method) without the additional libraries like V1, you can disable QB and QBG with this option.

Installation

Build

Downloads

On Linux without QG and QBG

  $ unzip NGT-x.x.x.zip
  $ cd NGT-x.x.x
  $ mkdir build
  $ cd build
  $ cmake -DNGT_QBG_DISABLED=ON ..
  $ make
  $ make install
  $ ldconfig /usr/local/lib

On CentOS

  $ yum install blas-devel lapack-devel
  $ unzip NGT-x.x.x.zip
  $ cd NGT-x.x.x
  $ mkdir build
  $ cd build
  $ cmake ..
  $ make
  $ make install
  $ ldconfig /usr/local/lib

On Ubuntu

  $ apt install libblas-dev liblapack-dev
  $ unzip NGT-x.x.x.zip
  $ cd NGT-x.x.x
  $ mkdir build
  $ cd build
  $ cmake ..
  $ make
  $ make install
  $ ldconfig /usr/local/lib

On macOS using homebrew

  $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  $ brew install cmake
  $ brew install libomp
  $ unzip NGT-x.x.x.zip
  $ cd NGT-x.x.x
  $ mkdir build
  $ cd build
  $ cmake ..
  $ make
  $ make install

Pre-Built

On macOS

  $ brew install ngt

NGT (Graph and tree-based method)

Key Features

  • Supported operating systems: Linux and macOS
  • Object additional registration and removal are available.
  • Objects beyond the memory size can be handled using the shared memory (memory mapped file) option.
  • Supported distance functions: L1, L2, Cosine similarity, Angular, Hamming, Jaccard, Poincare, and Lorentz
  • Data Types: 4 byte floating point number and 1 byte unsigned integer
  • Supported languages: Python, Ruby, PHP, Rust, Go, C, and C++
  • Distributed servers: ngtd and vald

Documents

Utilities

Supported Programming Languages

The following build parameters are available

Build parameters

Shared memory use

The index can be placed in shared memory with memory mapped files. Using shared memory can reduce the amount of memory needed when multiple processes are using the same index. In addition, it can not only handle an index with a large number of objects that cannot be loaded into memory, but also reduce time to open it. Since changes become necessary at build time, please add the following parameter when executing "cmake" in order to use shared memory.

  $ cmake -DNGT_SHARED_MEMORY_ALLOCATOR=ON ..

Note: Since there is no lock function, the index should be used only for reference when multiple processes are using the same index.

Large-scale data use

When you insert more than about 5 million objects for the graph-based method, please add the following parameter to improve the search time.

  $ cmake -DNGT_LARGE_DATASET=ON ..

Disable QG and QBG

QG and QBG require BLAS and LAPACK libraries. If you would not like to install these libraries and do not use QG and QBG, you can disable QG and QBG.

  $ cmake -DNGT_QBG_DISABLED=ON ..

QG (Quantized graph-based method)

Key Features

  • Higher performance than the graph and tree-based method
  • Supported operating systems: Linux and macOS
  • Supported distance functions: L2 and Cosine similarity

Documents

Utilities

  • Command : qbg

Supported Programming Languages

  • C++
  • C
  • Python only for search

Build parameters

For QG, it is recommended to disable rotation of the vector space and residual vectors to improve performance as follows.

  $ cmake -DNGTQG_NO_ROTATION=ON -DNGTQG_ZERO_GLOBAL=ON ..

QBG (Quantized blob graph-based method)

Key Features

  • QBG can handle billions of objects.
  • Supported operating systems: Linux and macOS
  • Supported distance functions: L2

Utilities

  • Command : qbg

Supported Programming Languages

  • C++
  • C
  • Python only for search

Benchmark Results

The followings are the results of ann benchmarks for NGT v2.0.0 where the timeout is 5 hours on an AWS c5.4xlarge instance.

glove-100-angular

gist-960-euclidean

fashion-mnist-784-euclidean

nytimes-256-angular

sift-128-euclidean

License

Copyright (C) 2015 Yahoo Japan Corporation

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributor License Agreement

This project requires contributors to accept the terms in the Contributor License Agreement (CLA).

Please note that contributors to the NGT repository on GitHub (https://github.com/yahoojapan/NGT) shall be deemed to have accepted the CLA without individual written agreements.

Contact Person

masajiro

Publications

ONNG
  • Iwasaki, M., Miyazaki, D.: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity. arXiv:1810.07355 [cs] (2018). (pdf)
PANNG
  • Iwasaki, M.: Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search. Proc. of SISAP2016 (2016) 20-33. (pdf)
  • Sugawara, K., Kobayashi, H. and Iwasaki, M.: On Approximately Searching for Similar Word Embeddings. Proc. of ACL2016 (2016) 2265-2275. (pdf)
ANNGT
  • Iwasaki, M.: Applying a Graph-Structured Index to Product Image Search (in Japanese). IIEEJ Journal 42(5) (2013) 633-641. (pdf)
  • Iwasaki, M.: Proximity search using approximate k nearest neighbor graph with a tree structured index (in Japanese). IPSJ Journal 52(2) (2011) 817-828. (pdf)
ANNG
  • Iwasaki, M.: Proximity search in metric spaces using approximate k nearest neighbor graph (in Japanese). IPSJ Trans. on Database 3(1) (2010) 18-28. (pdf)

More Repositories

1

objc2swift

Open Source Obj-C to Swift Converter.
Scala
1,033
star
2

SwiftyXMLParser

Simple XML Parser implemented in Swift
Swift
575
star
3

JGLUE

JGLUE: Japanese General Language Understanding Evaluation
Python
304
star
4

UICollectionViewSplitLayout

UICollectionViewSplitLayout makes collection view more responsive.
Swift
243
star
5

AnnexML

AnnexML is a multi-label classifier designed for extremely large label space.
C++
106
star
6

yskip

Incremental Skip-gram Model with Negative Sampling
Shell
69
star
7

yosegi

Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.
Java
66
star
8

XCMetricsAggregator

Automation tool for Xcode Metrics Organizer with AppleScript
Ruby
62
star
9

YJCaptions

60
star
10

bakusoku-jsonp

Codeless Blog Widgets framework
JavaScript
60
star
11

AppFeedback-ios

📸 You can post feedback messages and screenshots to Slack from your iOS app! 🎥
Objective-C
42
star
12

ngtd

Serving NGT over HTTP or gRPC ※This project is not maintained. We have moved to a new product, [Vald](https://vald.vdaas.org) .
Go
38
star
13

k2hash

K2HASH - NoSQL Key Value Store(KVS) library
C++
37
star
14

authorization-proxy

Moved to https://github.com/AthenZ/authorization-proxy
Go
35
star
15

presto_exporter

Go
34
star
16

k2hftfuse

File transaction by FUSE-based file system
C++
32
star
17

gongt

NGT Go client library
Go
28
star
18

fullock

Fast User Level LOCK library
C++
26
star
19

yjlogin-ios-sdk

Yahoo! JAPAN Login iOS SDK
Swift
26
star
20

ja-vg-vqa

26
star
21

gpu-monitoring-exporter

Prometheus exporter for GPU process metrics.
Shell
26
star
22

jenkins-with-docker-demo

Shell
25
star
23

geobleu

Python implementation of GEO-BLEU, a similarity evaluation method for trajectories
Python
22
star
24

lcom4go

Compute LCOM4, Lack of Cohesion of Methods metrics ver.4, for golang projects.
Go
21
star
25

yconnect-php-sdk

YConnect PHP SDK
PHP
21
star
26

vespa-tutorial

Japanese tutorial for Vespa
Shell
20
star
27

AppFeedback-android

📸 You can post feedback messages and screenshots to Slack from your Android app! 🎥
Java
20
star
28

presto-audit

THIS REPOSITORY IS DEPRECATED
Java
19
star
29

garm

Garm is k8s authorization webhook (SubjectAccessReview API) server for Athenz. Moved to https://github.com/AthenZ/garm
Go
17
star
30

chmpx

Consistent Hashing Mq inProcess data eXchange
C++
17
star
31

docker-continuous-integration-workflow

2014/02/12 Docker Meetup in Tokyo #1 での発表内容です。
Ruby
17
star
32

MultitaskingSample

iOS 7の新機能、BackgroundFetch, SilentPushNotification, BackgroundTransferを利用したサンプルコードです。
Objective-C
16
star
33

athenz-authorizer

athenz policy management library for golang. Moved to https://github.com/AthenZ/athenz-authorizer
Go
15
star
34

athenz-client-sidecar

Moved to https://github.com/AthenZ/athenz-client-sidecar
Go
15
star
35

vespa-kuromoji-linguistics

Java
15
star
36

k2hdkc

k2hdkc is k2hash based distributed kvs cluster
C++
13
star
37

big3store

Erlang
12
star
38

textwebapi-cookbook

Cookbook for the Text Analysis Web API provided by Yahoo! DEVELOPER NETWORK.
Jupyter Notebook
12
star
39

VFD-Dataset

Python
11
star
40

k2htp_dtor

K2HASH Distributed Transaction Of Repeater
C++
10
star
41

solr-plugin-samples

Java
9
star
42

VSU-Dataset

8
star
43

yconnect-servlet-sdk

YConnect Servlet SDK
Java
8
star
44

DynamicsSample

iOS 7の新機能、UIKit Dynamics、Motion Effectsを利用したサンプルコードです。
Objective-C
6
star
45

ConfigCacheBundle

Symfony ConfigCacheBundle for easier handling of user-defined configuration file cache
PHP
6
star
46

AntPickax

AntPickax provides basic libraries, components and systems
6
star
47

yjlogin-android-sdk

Kotlin
5
star
48

chmpx_nodejs

CHMPX nodejs addon library - Consistent Hashing Mq inProcess data eXchange
C++
5
star
49

k2hr3

K2HR3 - K2Hdkc based Resource and Roles and policy Rules
5
star
50

yosegi-spark

Java
5
star
51

hubot-shuffle

hubot-shuffle add shuffle system.
CoffeeScript
5
star
52

yosegi-hive

This is Yosegi's Hive plugin. This can write and read tables with Hive.
Java
5
star
53

k2hr3_osnl

K2HR3 OpenStack Notification Listener - K2Hdkc based Resource and Roles and policy Rules
Python
4
star
54

embulk-output-solr

Java
4
star
55

fastlane-plugin-setup_app_feedback_sdk

Fastlane plugin that update Info.plist for AppFeedback SDK
Ruby
4
star
56

k2hdkc_dbaas

Database as a Service for K2HDKC
Python
4
star
57

k2hash_phpext

PHP Extension library for K2HASH
C
4
star
58

k2hr3_utils

K2HR3 Utils - Utils for K2Hdkc based Resource and Roles and policy Rules
Shell
4
star
59

k2hr3_app

K2HR3 Web Application - K2Hdkc based Resource and Roles and policy Rules
JavaScript
4
star
60

k2hr3_api

K2HR3 REST API - K2Hdkc based Resource and Roles and policy Rules
JavaScript
4
star
61

k2htp_mdtor

K2Hash Transaction Plugin for Multiple Distributed Transaction Of Repeater
Shell
4
star
62

k2hr3_helm_chart

Helm Chart for K2HR3
Shell
3
star
63

k2hdkc_java

K2HDKC Java library - k2hash based distributed kvs cluster
Java
3
star
64

k2hdkc_go

K2HDKC Go library - k2hash based distributed kvs cluster
Go
3
star
65

yosegi-tools

Java
3
star
66

k2hash_go

K2HASH Go library - NoSQL Key Value Store(KVS) library
Go
3
star
67

yj-ci-dataset

3
star
68

k2hr3_cli

K2HR3 Command Line Interface
Shell
3
star
69

embulk-parser-xml2

Java
3
star
70

k2hr3_sidecar

K2HR3 Container Registration Sidecar - K2Hdkc based Resource and Roles and policy Rules
Shell
3
star
71

k2hdkc_python

K2HDKC Python library - k2hash based distributed kvs cluster
Python
3
star
72

k2hash_python

K2HASH Python library - NoSQL Key Value Store(KVS) library
Python
3
star
73

yosegi-hadoop

Java
3
star
74

k2hdkc_nodejs

K2HDKC nodejs addon library - k2hash based distributed kvs cluster
JavaScript
3
star
75

k2hash_nodejs

K2HASH nodejs addon library - NoSQL Key Value Store(KVS) nodejs library
JavaScript
3
star
76

k2hash_java

K2HASH Java library - NoSQL Key Value Store(KVS) library
Java
3
star
77

yosegi-avro

Java
2
star
78

k2hdkc_dbaas_override_conf

K2HDKC DBaaS Override Configuration
Shell
2
star
79

k2hdkc_dbaas_k8s_cli

K2HDKC DBaaS on kubernetes Command Line Interface - K2HR3 CLI Plugin
Shell
2
star
80

k2hr3_get_resource

K2HR3 Utilities - Get K2HR3 Resource Helper for Systemd service
Shell
2
star
81

k2hdkc_dbaas_cli

K2HDKC DBaaS Command Line Interface - K2HR3 CLI Plugin
Shell
2
star
82

hubot-package-version-release

publish release on GitHub based package.json
CoffeeScript
2
star
83

k2hdkc_helm_chart

Helm Chart for K2HDKC DBaaS
Shell
2
star
84

k2hr3client_python

k2hr3client_python is an official Python WebAPI client for k2hr3.
Python
2
star
85

k2hdkc_phpext

PHP Extension library for K2HDKC
PHP
1
star
86

yosegi-example

Java
1
star
87

chmpx_phpext

PHP Extension library for CHMPX
PHP
1
star
88

yosegi-legacy

Java
1
star