• Stars
    star
    231
  • Rank 169,691 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Framework for benchmarking vector search engines

vector-db-benchmark

Screenshot from 2022-08-23 14-10-01

View results

There are various vector search engines available, and each of them may offer a different set of features and efficiency. But how do we measure the performance? There is no clear definition and in a specific case you may worry about a specific thing, while not paying much attention to other aspects. This project is a general framework for benchmarking different engines under the same hardware constraints, so you can choose what works best for you.

Running any benchmark requires choosing an engine, a dataset and defining the scenario against which it should be tested. A specific scenario may assume running the server in a single or distributed mode, a different client implementation and the number of client instances.

How to run a benchmark?

Benchmarks are implemented in server-client mode, meaning that the server is running in a single machine, and the client is running on another.

Run the server

All engines are served using docker compose. The configuration is in the servers.

To launch the server instance, run the following command:

cd ./engine/servers/<engine-configuration-name>
docker compose up

Containers are expected to expose all necessary ports, so the client can connect to them.

Run the client

Install dependencies:

pip install poetry
poetry install

Run the benchmark:

Usage: run.py [OPTIONS]

  Example: python3 -m run --engines *-m-16-* --datasets glove-*

Options:
  --engines TEXT                  [default: *]
  --datasets TEXT                 [default: *]
  --host TEXT                     [default: localhost]
  --skip-upload / --no-skip-upload
                                  [default: no-skip-upload]
  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

Command allows you to specify wildcards for engines and datasets. Results of the benchmarks are stored in the ./results/ directory.

How to update benchmark parameters?

Each engine has a configuration file, which is used to define the parameters for the benchmark. Configuration files are located in the configuration directory.

Each step in the benchmark process is using a dedicated configuration's path:

  • connection_params - passed to the client during the connection phase.
  • collection_params - parameters, used to create the collection, indexing parameters are usually defined here.
  • upload_params - parameters, used to upload the data to the server.
  • search_params - passed to the client during the search phase. Framework allows multiple search configurations for the same experiment run.

Exact values of the parameters are individual for each engine.

How to register a dataset?

Datasets are configured in the datasets/datasets.json file. Framework will automatically download the dataset and store it in the datasets directory.

How to implement a new engine?

There are a few base classes that you can use to implement a new engine.

  • BaseConfigurator - defines methods to create collections, setup indexing parameters.
  • BaseUploader - defines methods to upload the data to the server.
  • BaseSearcher - defines methods to search the data.

See the examples in the clients directory.

Once all the necessary classes are implemented, you can register the engine in the ClientFactory.

More Repositories

1

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Rust
18,502
star
2

fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
Python
916
star
3

qdrant-client

Python client for Qdrant vector search engine
Python
636
star
4

quaterion

Blazing fast framework for fine-tuning similarity learning models
Python
627
star
5

awesome-metric-learning

😎 A curated list of awesome practical Metric Learning and its applications
441
star
6

qdrant-js

JavaScript/Typescript SDK for Qdrant Vector Database
TypeScript
204
star
7

rust-client

Rust client for Qdrant vector search engine
Rust
196
star
8

qdrant-web-ui

Self-hosted web UI for Qdrant
JavaScript
188
star
9

page-search

Neural search for web-sites, docs, articles - online!
Rust
123
star
10

go-client

Go client for Qdrant vector search engine
Go
120
star
11

qdrant_demo

Demo of the neural semantic search built with Qdrant
TypeScript
115
star
12

qdrant-helm

Go
82
star
13

examples

A collection of examples and tutorials for Qdrant vector search engine
Jupyter Notebook
79
star
14

qdrant-dotnet

Qdrant .Net SDK
C#
76
star
15

qdrant-spark

Qdrant's Apache Spark connector
Java
41
star
16

qdrant-haystack

An integration of Qdrant ANN vector database backend with Haystack
Python
39
star
17

ann-filtering-benchmark-datasets

Collection of datasets for benchmarking filtered vector similarity retrieval
Python
30
star
18

quaterion-models

The collection of bulding blocks building fine-tunable metric learning models
Python
30
star
19

demo-food-discovery

Source code of the food discovery demo built on top of Qdrant
TypeScript
28
star
20

goods_categorization_demo

Demo example of consumer goods categorization
Jupyter Notebook
25
star
21

qdrant-txtai

An integration of Qdrant ANN vector database backend with txtai
Python
23
star
22

java-client

Official Java client for Qdrant
Java
23
star
23

workshop-rag-optimization

Notebooks for RAG optimization workshop, using HackerNews data
Jupyter Notebook
20
star
24

wal

Write Ahead Logging for Rust
Rust
16
star
25

landing_page

Landing page for qdrant.tech
SCSS
14
star
26

bfb

*high-load* benchmarking tool
Rust
12
star
27

qdrant-rag-eval

This repo is the central repo for all the RAG Evaluation reference material and partner workshop
Jupyter Notebook
11
star
28

demo-cloud-faq

Demo of fine-tuning QA models for answering FAQ of cloud providers documentation
Python
10
star
29

benchmark

Collection of Qdrant benchmarks
Python
7
star
30

demo-code-search

Python
7
star
31

qdrant-markdown-indexer

Simple pipeline to index markdown files into Qdrant using OpenAI embeddings
Python
6
star
32

autocomplete-openapi

Autocomplete queries using OpenAPI spec
JavaScript
6
star
33

demo-hnm

Jupyter Notebook
5
star
34

quantization

Rust
5
star
35

haloperidol

Antipsychotic therapy for qdrant cluster
Shell
5
star
36

mri

A simple tool to monitor process resource usage as frequent as possible
Rust
5
star
37

demo-distributed-deployment-docker

An example of setting up the distributed deployment of Qdrant with docker-compose
Shell
4
star
38

contexto

A simple script to solve contexto.me using word embeddings
JavaScript
4
star
39

page-search-js

Web interface for integrated web-site search powered by Qdrant
JavaScript
4
star
40

sparse-vectors-benchmark

This is a benchmarking tool for Qdrant's sparse vector implementation
Python
4
star
41

demo-midlibrary-explorer-nextjs

Exploration of Midjourney Library styles using Qdrant and NextJS
JavaScript
4
star
42

qdrant_python_client

Qdrant Python client, generated from OpenAPI specification (with minor fixes)
Python
3
star
43

dataset-cloud-platform-faq

HTML
3
star
44

qdrant-qa-workshop

Jupyter Notebook
3
star
45

sparse-vectors-experiments

Rust
3
star
46

qdrant-langchain-qa

HTML
2
star
47

coach

Coach running drills to train qdrant deployments
Rust
2
star
48

rust-parser

Extracts semantics from rust code
Rust
2
star
49

qdrant-stars-handbook

The primary goal of Qdrant Stars is to recognize, reward, and support our most active and helpful users making significant contributions to the vector search community. The Qdrant Stars Handbook is your go-to resource for navigating the Stars program and how to take the most advantage of it.
2
star
50

rivet-plugin-qdrant

TypeScript
2
star
51

tutorials

This repo contains tutorials, demos, and how-to guides on how to use Qdrant and adjacent technologies.
Jupyter Notebook
1
star
52

demo-qdrant-fiftyone

An example of integrating Qdrant with FiftyOne
Python
1
star
53

qdrant-genkit

Go
1
star
54

crasher

Crashing the party
Rust
1
star
55

api-reference

Repository for Qdrant's API Reference Documentation
Java
1
star