• Stars
    star
    170
  • Rank 215,405 (Top 5 %)
  • Language
    Rust
  • License
    MIT License
  • Created 6 months ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Annoy-inspired Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage 💥

arroy

License Crates.io Docs dependency status Build

Arroy (Approximate Rearest Reighbors Oh Yeah) is a Rust library with the interface of the Annoy Python library to search for vectors in space that are close to a given query vector. It is based on LMDB, a memory-mapped key-value store, so many processes may share the same data and atomically modify the vectors.

Background

There are some other libraries to do nearest neighbor search. However, most of them are memory-bound, and none use LMDB for their storage. Annoy considered using LMDB as a backend since 2015. We built Meilisearch on top of LMDB; therefore, it was an obvious choice. As Annoy highly inspires it, we benefit from the same low memory footprint.

Why is this useful? If you want to find the nearest neighbors and have many CPUs, you only need to build the index once. Any thread will be able to query the LMDB-based index and will be able to do lookups immediately, even while another index is modifying it.

We use it inside Meilisearch. This library helps our users search for similar documents. Our users have many millions of them in a high-dimensional space (i.e., 768 on average and 1536 for OpenAI), so memory usage is a prime concern.

Arroy was built by @Kerollmops and @irevoire with the help of @dureuill in a week by porting the original C++ source code of Annoy.

Summary of features

  • Euclidean distance, Manhattan distance, cosine distance, or Dot (Inner) Product distance
  • Cosine distance is equivalent to Euclidean distance of normalized vectors i.e., sqrt(2-2*cos(u, v))
  • Works better if you don't have too many dimensions (like <100) but seems to perform surprisingly well even up to 1,000 dimensions
  • Small memory usage
  • Lets you share memory between multiple processes using LMDB
  • Index creation is separate from lookup (in particular, you can not add more items once the tree has been created)
  • Build index on disk to enable indexing big datasets that won't fit into memory using LMDB
  • Multithreaded tree building using rayon
  • Additional features compared to Annoy
    • Filter when querying
    • Incrementally update the tree without rebuilding it from scratch
    • Store and modify different indexes atomically using LMDB (indexes are identified by an u16)
    • Modify the items list in place while performing queries using LMDB
    • Storage based on LMDB using LMDB
    • Safer to use API, i.e., check dimensions, distances, etc
    • The database size does not depend on the highest item ID but on the number of items
    • Generic over your random number generator

Missing features

  • No Python support
  • No Hamming distance support
  • Generally slower due to the log(n) lookups and non-aligned vectors due to LMDB

Tradeoffs

Only two main parameters are needed to tune Arroy: the number of trees n_trees and the number of nodes to inspect during searching search_k.

  • n_trees is provided during build time and affects the build time and the index size. A larger value will give more accurate results but larger indexes.
  • search_k is provided in runtime and affects the search performance. A larger value will give more accurate results but will take a longer time to return.

If search_k is not provided, it will default to n * n_trees where n is the number of approximate nearest neighbors. Otherwise, search_k and n_trees are roughly independent, i.e., the value of n_trees will not affect search time if search_k is held constant and vice versa. Basically, it's recommended to set n_trees as large as possible given the amount of memory you can afford, and it's recommended to set search_k as large as possible given the time constraints you have for the queries.

How does it work

Using random projections and by building up a tree. At every intermediate node in the tree, a random hyperplane is chosen, which divides the space into two subspaces. This hyperplane is determined by sampling two points from the subset and taking the hyperplane equidistant from them.

We do this k times so that we get a forest of trees. k has to be tuned to your needs by looking at what tradeoff you have between precision and performance.

Dot Product distance (originally contributed by @psobot and @pkorobov) reduces the provided vectors from dot (or "inner-product") space to a more query-friendly cosine space using a method by Bachrach et al., at Microsoft Research, published in 2014.

Source code

It's all written in Rust and based on LMDB without a handful of ugly optimizations for performance and memory usage. You have been warned :)

The code should support Windows, thanks to LMDB and the Rust programming language.

Big thanks to the open-source community

  • Thanks to Qdrant for their SIMD distances functions
  • Thanks to Spotify for the original idea of Annoy

More Repositories

1

meilisearch

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
Rust
43,248
star
2

meilisearch-js

JavaScript client for the Meilisearch API
TypeScript
672
star
3

meilisearch-php

PHP wrapper for the Meilisearch API
PHP
543
star
4

meilisearch-laravel-scout

MeiliSearch integration for Laravel Scout
PHP
467
star
5

milli

Search engine library for Meilisearch ⚡️
Rust
459
star
6

meilisearch-js-plugins

The search client to use Meilisearch with InstantSearch.
TypeScript
449
star
7

meilisearch-go

Golang wrapper for the Meilisearch API
Go
444
star
8

heed

A fully typed LMDB wrapper with minimum overhead 🐦
Rust
424
star
9

meilisearch-python

Python wrapper for the Meilisearch API
Python
400
star
10

MeiliES

A Rust based event store using the Redis protocol
Rust
326
star
11

meilisearch-rust

Rust wrapper for the Meilisearch API.
Rust
316
star
12

meilisearch-rails

Meilisearch integration for Ruby on Rails
Ruby
273
star
13

docs-scraper

Scrape documentation into Meilisearch
Python
257
star
14

meilisearch-dotnet

.NET wrapper for the Meilisearch API
C#
232
star
15

strapi-plugin-meilisearch

A strapi plugin to add your collections to Meilisearch
JavaScript
208
star
16

mini-dashboard

mini-dashboard for Meilisearch
JavaScript
204
star
17

charabia

Library used by Meilisearch to tokenize queries and documents
Rust
202
star
18

meilisearch-ruby

Ruby SDK for the Meilisearch API
Ruby
186
star
19

meilisearch-react

182
star
20

meilisearch-kubernetes

Meilisearch on Kubernetes Helm charts and manifests
Mustache
181
star
21

meilisearch-java

Java client for Meilisearch
Java
163
star
22

docs-searchbar.js

Front-end search bar for documentation with Meilisearch
JavaScript
161
star
23

meilisearch-vue

148
star
24

documentation

Meilisearch documentation
MDX
132
star
25

integration-guides

Central reference for Meilisearch integrations.
Shell
127
star
26

meilisearch-symfony

Seamless integration of Meilisearch into your Symfony project.
PHP
111
star
27

awesome-meilisearch

A curated list of awesome Meilisearch resources
88
star
28

meilisearch-swift

Swift client for the Meilisearch API
Swift
87
star
29

firestore-meilisearch

Fulltext search on Firebase with Meilisearch
TypeScript
83
star
30

meilisearch-dart

The Meilisearch API client written for Dart
Dart
73
star
31

ecommerce-demo

Nuxt 3 ecommerce site search with filtering and facets powered by Meilisearch
Vue
73
star
32

vuepress-plugin-meilisearch

Add a relevant and typo tolerant search bar to your VuePress
JavaScript
62
star
33

saas-demo

App search in a CRM use case, powered by Meilisearch
PHP
58
star
34

product

Public feedback and ideation discussions for Meilisearch product 🔮
55
star
35

meilisearch-wordpress

WordPress plugin for Meilisearch.
PHP
53
star
36

demos

A list of Meilisearch demos with open-source code and live preview ⚡️
CoffeeScript
43
star
37

gatsby-plugin-meilisearch

A plugin to index your Gatsby content to Meilisearch based on graphQL queries
JavaScript
40
star
38

demo-movies

A website that lets you know where to watch a movie
JavaScript
34
star
39

landing

Meilisearch's landing page
JavaScript
33
star
40

meilisearch-migration

Scripts to update Meilisearch version's.
Shell
32
star
41

devrel

Anything Developer Relations at Meili
CSS
27
star
42

meilisearch-digitalocean

Meilisearch services on DigitalOcean
Python
24
star
43

meilisearch-angular

Instant Meilisearch for Angular Framework
23
star
44

deserr

Deserialization library with focus on error handling
Rust
22
star
45

meilisearch-aws

AWS services for Meilisearch
Python
20
star
46

cargo-flaky

A cargo sub-command to helps you find flaky tests
Rust
20
star
47

meilisearch-gcp

Meilisearch services on GCP
Python
20
star
48

grenad

Tools to sort, merge, write, and read immutable key-value pairs 🍅
Rust
19
star
49

specifications

Track specification elaboration.
18
star
50

madness

an async mdns library for tokio
Rust
17
star
51

demo-finding-crates

Expose all crates from crates.io with MeiliSearch
Rust
17
star
52

scrapix

TypeScript
16
star
53

transplant

Rust
15
star
54

cloud-providers

☁ Meilisearch DevOps Tools for the Cloud ☁
Shell
15
star
55

engine-team

Repository gathering the development process of the core-team
13
star
56

meilisearch-importer

A CLI to import massive CSV and NdJson into Meilisearch
Rust
12
star
57

compute-embeddings

A small tool to compute the embeddings of a list of JSON documents
Rust
10
star
58

cloud-scripts

Cloud scripts for cloud provider agnostic configuration
Shell
9
star
59

demo-finding-rubygems

Alternative search bar for RubyGems
Ruby
8
star
60

minimeili-raft

A small implementation of a dummy Meilisearch running on top of Raft
Rust
7
star
61

demo-MoMA

A MeiliSearch demo using the Museum Of Modern Art Collection
JavaScript
6
star
62

strapi-plugin-meilisearch-v4

Work in progress
JavaScript
6
star
63

searchbar.js

wip
JavaScript
6
star
64

meili-aoc

meili-aoc
Rust
5
star
65

mini-search-engine-presentation

A simple and "short" presentation of the search engine
5
star
66

vercel-demo

A website that lets you know where to watch a movie built on Next.js and Meilisearch, deployed on Vercel with the Meilisearch + Vercel integration.
JavaScript
5
star
67

meilisearch-flutter

[wip] A basic UI kit with Meilisearch search widgets for Flutter
CMake
4
star
68

jayson

Rust
4
star
69

nelson

Rust
4
star
70

demo-finding-pypi

Alternative search bar for PyPI packages
Python
4
star
71

nextjs-starter-meilisearch-table

TypeScript
3
star
72

js-project-boilerplate

A boilerplate providing basic configuration for JavaScript projects in Meilisearch
3
star
73

synonyms

2
star
74

devops-tools

Shell
2
star
75

discord-bot-productboard

JavaScript
2
star
76

strois

A simple non-async S3 client based on the REST API
Rust
2
star
77

datasets

2
star
78

poc-vector-store-recall

A experimental tool that uses the vector store to increase Meilisearch's recall
Rust
2
star
79

.github

1
star
80

actions

Meilisearch Github Actions
JavaScript
1
star
81

devspector

Develop specification inspector
JavaScript
1
star
82

design-team

1
star
83

movies-react-demo

Created with CodeSandbox
HTML
1
star
84

ansible-vm-benchmarks

Ansible Playbook to index datasets on several typology of Instance on a specific Meilisearch version/commit
Rust
1
star
85

akamai-purge

A Rust helper to purge Akamai cache
Rust
1
star
86

poc-heed-codec

A repository to help us define the new design of heed
Rust
1
star
87

mainspector

Main specification inspector
JavaScript
1
star
88

settings_guessr

A tool that guess your settings by using the dataset
Rust
1
star
89

meilisearch-webhook-usage-example

Example of how to use the meilisearch webhook
Rust
1
star
90

meilikeeper

A sync zookeeper client on top of the official C client
Rust
1
star
91

massive-meilisearch-sampling

A program that generates and sends dataset and samples update/deletes to a Meilisearch server
Rust
1
star
92

zookeeper-client-sync

zookeeper-client-sync
Rust
1
star