• Stars
    star
    272
  • Rank 151,235 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Framework for benchmarking vector search engines

vector-db-benchmark

Screenshot from 2022-08-23 14-10-01

View results

There are various vector search engines available, and each of them may offer a different set of features and efficiency. But how do we measure the performance? There is no clear definition and in a specific case you may worry about a specific thing, while not paying much attention to other aspects. This project is a general framework for benchmarking different engines under the same hardware constraints, so you can choose what works best for you.

Running any benchmark requires choosing an engine, a dataset and defining the scenario against which it should be tested. A specific scenario may assume running the server in a single or distributed mode, a different client implementation and the number of client instances.

How to run a benchmark?

Benchmarks are implemented in server-client mode, meaning that the server is running in a single machine, and the client is running on another.

Run the server

All engines are served using docker compose. The configuration is in the servers.

To launch the server instance, run the following command:

cd ./engine/servers/<engine-configuration-name>
docker compose up

Containers are expected to expose all necessary ports, so the client can connect to them.

Run the client

Install dependencies:

pip install poetry
poetry install

Run the benchmark:

Usage: run.py [OPTIONS]

  Example: python3 -m run --engines *-m-16-* --datasets glove-*

Options:
  --engines TEXT                  [default: *]
  --datasets TEXT                 [default: *]
  --host TEXT                     [default: localhost]
  --skip-upload / --no-skip-upload
                                  [default: no-skip-upload]
  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

Command allows you to specify wildcards for engines and datasets. Results of the benchmarks are stored in the ./results/ directory.

How to update benchmark parameters?

Each engine has a configuration file, which is used to define the parameters for the benchmark. Configuration files are located in the configuration directory.

Each step in the benchmark process is using a dedicated configuration's path:

  • connection_params - passed to the client during the connection phase.
  • collection_params - parameters, used to create the collection, indexing parameters are usually defined here.
  • upload_params - parameters, used to upload the data to the server.
  • search_params - passed to the client during the search phase. Framework allows multiple search configurations for the same experiment run.

Exact values of the parameters are individual for each engine.

How to register a dataset?

Datasets are configured in the datasets/datasets.json file. Framework will automatically download the dataset and store it in the datasets directory.

How to implement a new engine?

There are a few base classes that you can use to implement a new engine.

  • BaseConfigurator - defines methods to create collections, setup indexing parameters.
  • BaseUploader - defines methods to upload the data to the server.
  • BaseSearcher - defines methods to search the data.

See the examples in the clients directory.

Once all the necessary classes are implemented, you can register the engine in the ClientFactory.

More Repositories

1

qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Rust
19,659
star
2

fastembed

Fast, Accurate, Lightweight Python library to make State of the Art Embedding
Python
1,335
star
3

qdrant-client

Python client for Qdrant vector search engine
Python
709
star
4

quaterion

Blazing fast framework for fine-tuning similarity learning models
Python
630
star
5

awesome-metric-learning

😎 A curated list of awesome practical Metric Learning and its applications
460
star
6

qdrant-js

JavaScript/Typescript SDK for Qdrant Vector Database
TypeScript
241
star
7

rust-client

Rust client for Qdrant vector search engine
Rust
216
star
8

qdrant-web-ui

Self-hosted web UI for Qdrant
JavaScript
200
star
9

go-client

Go client for Qdrant vector search engine
Go
141
star
10

qdrant_demo

Demo of the neural semantic search built with Qdrant
TypeScript
125
star
11

page-search

Neural search for web-sites, docs, articles - online!
Rust
124
star
12

qdrant-helm

Go
101
star
13

examples

A collection of examples and tutorials for Qdrant vector search engine
Jupyter Notebook
92
star
14

qdrant-dotnet

Qdrant .Net SDK
C#
87
star
15

bm42_eval

Evaluation of bm42 sparse indexing algorithm
Python
57
star
16

qdrant-haystack

An integration of Qdrant ANN vector database backend with Haystack
Python
42
star
17

qdrant-spark

Qdrant's Apache Spark connector
Java
41
star
18

java-client

Official Java client for Qdrant
Java
37
star
19

qdrant-rag-eval

This repo is the central repo for all the RAG Evaluation reference material and partner workshop
Jupyter Notebook
37
star
20

demo-food-discovery

Source code of the food discovery demo built on top of Qdrant
TypeScript
31
star
21

ann-filtering-benchmark-datasets

Collection of datasets for benchmarking filtered vector similarity retrieval
Python
31
star
22

quaterion-models

The collection of bulding blocks building fine-tunable metric learning models
Python
30
star
23

goods_categorization_demo

Demo example of consumer goods categorization
Jupyter Notebook
25
star
24

wal

Write Ahead Logging for Rust
Rust
25
star
25

qdrant-txtai

An integration of Qdrant ANN vector database backend with txtai
Python
22
star
26

workshop-rag-optimization

Notebooks for RAG optimization workshop, using HackerNews data
Jupyter Notebook
21
star
27

landing_page

Landing page for qdrant.tech
SCSS
18
star
28

demo-code-search

Python
15
star
29

bfb

*high-load* benchmarking tool
Rust
12
star
30

workshop-ultimate-hybrid-search

Materials for the Ultimate Hybrid Search Workshop
Jupyter Notebook
12
star
31

demo-cloud-faq

Demo of fine-tuning QA models for answering FAQ of cloud providers documentation
Python
11
star
32

benchmark

Collection of Qdrant benchmarks
Python
7
star
33

demo-hnm

Jupyter Notebook
6
star
34

qdrant-markdown-indexer

Simple pipeline to index markdown files into Qdrant using OpenAI embeddings
Python
6
star
35

autocomplete-openapi

Autocomplete queries using OpenAPI spec
JavaScript
6
star
36

mri

A simple tool to monitor process resource usage as frequent as possible
Rust
6
star
37

quantization

Rust
5
star
38

haloperidol

Antipsychotic therapy for qdrant cluster
Shell
5
star
39

demo-midlibrary-explorer-nextjs

Exploration of Midjourney Library styles using Qdrant and NextJS
JavaScript
5
star
40

qdrant-kafka

Kafka Sink connector for Qdrant. Stream vector data into a Qdrant collection. Supports named/unnamed dense/sparse vectors.
Java
5
star
41

page-search-js

Web interface for integrated web-site search powered by Qdrant
JavaScript
4
star
42

demo-distributed-deployment-docker

An example of setting up the distributed deployment of Qdrant with docker-compose
Shell
4
star
43

qdrant-genkit

Go
4
star
44

contexto

A simple script to solve contexto.me using word embeddings
JavaScript
4
star
45

sparse-vectors-benchmark

This is a benchmarking tool for Qdrant's sparse vector implementation
Python
4
star
46

api-reference

Repository for Qdrant's API Reference Documentation
Java
4
star
47

qdrant_python_client

Qdrant Python client, generated from OpenAPI specification (with minor fixes)
Python
3
star
48

dataset-cloud-platform-faq

HTML
3
star
49

qdrant-qa-workshop

Jupyter Notebook
3
star
50

sparse-vectors-experiments

Rust
3
star
51

rust-parser

Extracts semantics from rust code
Rust
3
star
52

qdrant-stars-handbook

The primary goal of Qdrant Stars is to recognize, reward, and support our most active and helpful users making significant contributions to the vector search community. The Qdrant Stars Handbook is your go-to resource for navigating the Stars program and how to take the most advantage of it.
3
star
53

rivet-plugin-qdrant

TypeScript
3
star
54

qdrant-langchain-qa

HTML
2
star
55

terraform-provider-qdrant-cloud

A Terraform plugin that allows to manage and configure Qdrant Cloud resources
Go
2
star
56

coach

Coach running drills to train qdrant deployments
Rust
2
star
57

crasher

Crashing the party
Rust
2
star
58

tutorials

This repo contains tutorials, demos, and how-to guides on how to use Qdrant and adjacent technologies.
Jupyter Notebook
1
star
59

hybrid-cloud-examples

HCL
1
star
60

demo-qdrant-fiftyone

An example of integrating Qdrant with FiftyOne
Python
1
star
61

template-hybrid-search-fasthtml

A template for a FastHTML app implementing hybrid search with Qdrant
Python
1
star