• Stars
    star
    913
  • Rank 50,033 (Top 1.0 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Up to 200x Faster Dot Products & Similarity Metrics โ€” for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 ๐Ÿ“

SimSIMD ๐Ÿ“

Hardware-Accelerated Similarity Metrics and Distance Functions

  • Zero-dependency header-only C 99 library.
  • Bindings for Python, Rust and JavaScript.
  • Targets ARM NEON, SVE, x86 AVX2, AVX-512 (VNNI, FP16) hardware backends.
  • Zero-copy compatible with NumPy, PyTorch, TensorFlow, and other tensors.
  • Handles f64 double-, f32 single-, and f16 half-precision, i8 integral, and binary vectors.
  • Up to 200x faster than scipy.spatial.distance and numpy.inner.
  • Used in USearch and several DBMS products.

Implemented distance functions include:

  • Euclidean (L2), Inner Distance, and Cosine (Angular) spatial distances.
  • Hamming (~ Manhattan) and Jaccard (~ Tanimoto) binary distances.
  • Kullback-Leibler and Jensenโ€“Shannon divergences for probability distributions.

Technical Insights and related articles:

Benchmarks

Apple M2 Pro

Given 1000 embeddings from OpenAI Ada API with 1536 dimensions, running on the Apple M2 Pro Arm CPU with NEON support, here's how SimSIMD performs against conventional methods:

Kind f32 improvement f16 improvement i8 improvement Conventional method SimSIMD
Cosine 32 x 79 x 133 x scipy.spatial.distance.cosine cosine
Euclidean ยฒ 5 x 26 x 17 x scipy.spatial.distance.sqeuclidean sqeuclidean
Inner Distance 2 x 9 x 18 x numpy.inner inner
Jensen Shannon 31 x 53 x scipy.spatial.distance.jensenshannon jensenshannon

Intel Sapphire Rapids

On the Intel Sapphire Rapids platform, SimSIMD was benchmarked against auto-vectorized code using GCC 12. GCC handles single-precision float but might not be the best choice for int8 and _Float16 arrays, which has been part of the C language since 2011.

Kind GCC 12 f32 GCC 12 f16 SimSIMD f16 f16 improvement
Cosine 3.28 M/s 336.29 k/s 6.88 M/s 20 x
Euclidean ยฒ 4.62 M/s 147.25 k/s 5.32 M/s 36 x
Inner Distance 3.81 M/s 192.02 k/s 5.99 M/s 31 x
Jensen Shannon 1.18 M/s 18.13 k/s 2.14 M/s 118 x

Broader Benchmarking Results:

Using SimSIMD in Python

Installation

pip install simsimd

Distance Between 2 Vectors

import simsimd
import numpy as np

vec1 = np.random.randn(1536).astype(np.float32)
vec2 = np.random.randn(1536).astype(np.float32)
dist = simsimd.cosine(vec1, vec2)

Supported functions include cosine, inner, sqeuclidean, hamming, and jaccard.

Distance Between 2 Batches

batch1 = np.random.randn(100, 1536).astype(np.float32)
batch2 = np.random.randn(100, 1536).astype(np.float32)
dist = simsimd.cosine(batch1, batch2)

If either batch has more than one vector, the other batch must have one or the same number of vectors. If it contains just one, the value is broadcasted.

All Pairwise Distances

For calculating distances between all possible pairs of rows across two matrices (akin to scipy.spatial.distance.cdist):

matrix1 = np.random.randn(1000, 1536).astype(np.float32)
matrix2 = np.random.randn(10, 1536).astype(np.float32)
distances = simsimd.cdist(matrix1, matrix2, metric="cosine")

Multithreading

By default, computations use a single CPU core. To optimize and utilize all CPU cores on Linux systems, add the threads=0 argument. Alternatively, specify a custom number of threads:

distances = simsimd.cdist(matrix1, matrix2, metric="cosine", threads=0)

Hardware Backend Capabilities

To view a list of hardware backends that SimSIMD supports:

print(simsimd.get_capabilities())

Using Python API with USearch

Want to use it in Python with USearch? You can wrap the raw C function pointers SimSIMD backends into a CompiledMetric and pass it to USearch, similar to how it handles Numba's JIT-compiled code.

from usearch.index import Index, CompiledMetric, MetricKind, MetricSignature
from simsimd import pointer_to_sqeuclidean, pointer_to_cosine, pointer_to_inner

metric = CompiledMetric(
    pointer=pointer_to_cosine("f16"),
    kind=MetricKind.Cos,
    signature=MetricSignature.ArrayArraySize,
)

index = Index(256, metric=metric)

Using SimSIMD in Rust

To install, add the following to your Cargo.toml:

[dependencies]
simsimd = "..."

To use it:

use simsimd::{cosine, sqeuclidean};

fn main() {
    let vector_a = vec![1.0, 2.0, 3.0];
    let vector_b = vec![4.0, 5.0, 6.0];

    let distance = cosine(&vector_a, &vector_b);
    println!("Cosine Distance: {}", distance);

    let distance = sqeuclidean(&vector_a, &vector_b);
    println!("Squared Euclidean Distance: {}", distance);
}

Using SimSIMD in JavaScript

To install, choose one of the following options depending on your environment:

  • npm install --save simsimd
  • yarn add simsimd
  • pnpm add simsimd
  • bun install simsimd

The package is distributed with prebuilt binaries for Node.js v10 and above for Linux (x86_64, arm64), macOS (x86_64, arm64), and Windows (i386,x86_64).

If your platform is not supported, you can build the package from source via npm run build. This will automatically happen unless you install the package with --ignore-scripts flag or use Bun.

After you install it, you will be able to call the SimSIMD functions on various TypedArray variants:

const { sqeuclidean, cosine, inner, hamming, jaccard } = require('simsimd');

const vectorA = new Float32Array([1.0, 2.0, 3.0]);
const vectorB = new Float32Array([4.0, 5.0, 6.0]);

const distance = sqeuclidean(vectorA, vectorB);
console.log('Squared Euclidean Distance:', distance);

Using SimSIMD in C

For integration within a CMake-based project, add the following segment to your CMakeLists.txt:

FetchContent_Declare(
    simsimd
    GIT_REPOSITORY https://github.com/ashvardanian/simsimd.git
    GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(simsimd)

If you're aiming to utilize the _Float16 functionality with SimSIMD, ensure your development environment is compatible with C 11. For other functionalities of SimSIMD, C 99 compatibility will suffice. A minimal usage example would be:

#include <simsimd/simsimd.h>

int main() {
    simsimd_f32_t vector_a[1536];
    simsimd_f32_t vector_b[1536];
    simsimd_f32_t distance = simsimd_avx512_f32_cos(vector_a, vector_b, 1536);
    return 0;
}

All of the functions names follow the same pattern: simsimd_{backend}_{type}_{metric}.

  • The backend can be avx512, avx2, neon, or sve.
  • The type can be f64, f32, f16, i8, or b8.
  • The metric can be cos, ip, l2sq, hamming, jaccard, kl, or js.

In case you want to avoid hard-coding the backend, you can use the simsimd_metric_punned_t to pun the function pointer, and simsimd_capabilities function to get the available backends at runtime.

Benchmarking and Contributing

To rerun experiments utilize the following command:

cmake -DCMAKE_BUILD_TYPE=Release -DSIMSIMD_BUILD_BENCHMARKS=1 -B ./build_release
cmake --build build_release --config Release
./build_release/simsimd_bench
./build_release/simsimd_bench --benchmark_filter=js

To test and benchmark with Python bindings:

pip install -e .
pytest python/test.py -s -x 

pip install numpy scipy scikit-learn # for comparison baselines
python python/bench.py # to run default benchmarks
python python/bench.py --n 1000 --ndim 1000000 # batch size and dimensions

To test and benchmark JavaScript bindings:

npm install --dev
npm test
npm run bench

To test and benchmark GoLang bindings:

cd golang
go test # To test
go test -run=^$ -bench=. -benchmem # To benchmark

To test and benchmark Rust bindings:

cargo test 
cargo bench 
open ./target/criterion/report/index.html

More Repositories

1

StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc ๐Ÿฆ–
C++
2,094
star
2

SwiftSemanticSearch

Real-time on-device text-to-image and image-to-image Semantic Search with video stream capture using USearch & UForm AI Swift SDKs for Apple devices ๐Ÿ
Swift
81
star
3

ParallelReductionsBenchmark

Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!
C++
73
star
4

usearch-molecules

Searching for structural similarities across billions of molecules in milliseconds
Python
46
star
5

memchr_vs_stringzilla

memchr vs stringzilla - up to 7x throughput difference between two SIMD-accelerated substring search libraries in Rust
Rust
45
star
6

usearch-images

Semantic Search demo featuring UForm, USearch, UCall, and StreamLit, to visual and retrieve from image datasets, similar to "CLIP Retrieval"
Python
38
star
7

BenchmarkingTutorial

Google Benchmark tutorial for C/C++ developers diving into High-Performance Computing and Numerical Methods โฑ๏ธ
C++
26
star
8

usearch-binary

Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread
Jupyter Notebook
19
star
9

tinysemver

Tiny Semantic Versioning (SemVer) library and GitHub CI, that doesn't depend on 300K lines of JavaScript code and fits in a single Python file
Python
16
star
10

cpp-cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
Cuda
16
star
11

abusing-vector-search

Example of using Vector Search algorithms for non-traditional workloads, like GIS, stock prices, and sets
Python
12
star
12

MongooseMiner

Documentation retrieval system to help LLMs navigate less-popular (yet often more powerful) Python libraries
Python
11
star
13

HashTableBenchmark

A simple cross-platform speed & memory-efficiency benchmark for the most common hash-table implementations in the C++ world
C++
11
star
14

LibSee

Link to this library and it will log all the LibC functions you are calling and how much time you are spending in them!
C
11
star
15

extrapolaTED

Bringing TED experiences to every topic with Gen AI
Jupyter Notebook
8
star
16

affine-gaps

Less-wrong single-file Numba-accelerated Python implementation of Gotoh affine gap penalty extensions for the Needlemanโ€“Wunsch, Smith-Waterman, and Levenshtein algorithms for sequence alignment
Python
7
star
17

AssemblyStats

A research project highlighting the rarity of SIMD instructions in modern software
Jupyter Notebook
6
star
18

TenPack

Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy ๐Ÿ–ผ๏ธ๐ŸŽต๐ŸŽฅ โžก๏ธ ๐Ÿง 
C++
6
star
19

scaling-democracy

GPU-accelerated Schulze voting method in Python, Numba, and CUDA, using ideas from Algebraic Graph Theory
Cuda
5
star
20

acid-redis

Tiny Redis-like Persistent ACID Store on RocksDB with JSON-RPC using UCall and UStore
CMake
4
star
21

PolyglotBot

Bot we've build for the Poe.com hackathon at the AGI house to gather results from multiple LLMs and trigger specialized models on-demand
Python
4
star
22

HaversineSimSIMD

Staging area for Haversine distance computations in SimSIMD and USearch
C++
4
star
23

image-search

Semantic Image Search Server with UForm, USearch, UCall
Python
3
star
24

spacev-1b

Billion-scale Semantic Search dataset derived from Microsoft SpaceV for Vector Search benchmarks
1
star
25

AppResources

Collection of resources for app development
1
star
26

CppNeuralSTL

Simple neural network models from scratch using only C++ STL
C++
1
star