• Stars
    star
    482
  • Rank 87,654 (Top 2 %)
  • Language
    C++
  • License
    MIT License
  • Created over 1 year ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A SQLite extension for efficient vector search, based on Faiss!

sqlite-vss

sqlite-vss (SQLite Vector Similarity Search) is a SQLite extension that brings vector search capabilities to SQLite, based on Faiss. It can be used to build semantic search engines, recommendations, or questions-and-answering tools.

See Introducing sqlite-vss: A SQLite Extension for Vector Search (February 2023) for more details and a live example!

If your company or organization finds this library useful, consider supporting my work!

Usage

.load ./vector0
.load ./vss0

select vss_version(); -- 'v0.0.1'

sqlite-vss has a similar API to the fts5 Full-Text Search Extension. Use the vss0 module to create virtual tables that can efficiently store and query your vectors.

-- 384 == number of dimensions for this example
create virtual table vss_articles using vss0(
  headline_embedding(384),
  description_embedding(384),
);

sqlite-vss is a Bring-your-own-vectors database, it is compatable with any embedding or vector data you have. Consider using OpenAI's Embeddings API, HuggingFace's Inference API, sentence-transformers, or any of these open source model. In this example, we are using sentence-transformers/all-MiniLM-L6-v2 to generate embeddings from our text, which have 384 dimensions.

You can insert vectors into vss0 tables as JSON or raw bytes.

insert into vss_articles(rowid, headline_embedding)
  select rowid, headline_embedding from articles;

To query for similar vectors ("k nearest neighbors"), use the vss_search function in the WHERE clause. Here we are searching for the 100 nearest neighbors to the embedding in row #123 in the articles table.

select rowid, distance
from vss_articles
where vss_search(
  headline_embedding,
  (select headline_embedding from articles where rowid = 123)
)
limit 100;

You can INSERT and DELETE into these tables as necessary, but UPDATE operations aren't supported yet. This can be used with triggers for automatically updated indexes. Also note that "small" INSERT/DELETE operations that only insert a few rows can be slow, so batch where necessary.

begin;

delete from vss_articles
  where rowid between 100 and 200;

insert into vss_articles(rowid, headline_embedding, description_embedding)
  values (:rowid, :headline_embedding, :description_embedding)

commit;

You can pass in custom Faiss factory strings for specific columns to control how the Faiss index is stored and queried. By default the factory string is "Flat,IDMap2", which can be slow to query as your database grows. Here, we add an inverted file index with 4096 centroids, a non-exhaustive option that makes large database queries much faster.

create virtual table vss_ivf_articles using vss0(
  headline_embedding(384) factory="IVF4096,Flat,IDMap2",
  description_embedding(384) factory="IVF4096,Flat,IDMap2"
);

This IVF will require training! You can define training data with a INSERT command in a single transaction, with the special operation="training" constraint.

insert into vss_ivf_articles(operation, headline_embedding, description_embedding)
  select
    'training',
    headline_embedding,
    description_embedding
  from articles;

Beware! Indexes that require training can take a long time. With the News Category Dataset (386 dimension over 210k vectors) that this example is based on, the default index would take 8 seconds to build. But with the custom "IVF4096,Flat,IDMap2" factory, it took 45 minutes to train and 4.5 minutes to insert data! This likely can be reduced with a smaller training set, but the faster queries can be helpful.

Documentation

See docs.md for a instructions to compile sqlite-vss yourself, as well as a full SQL API reference.

Installing

The Releases page contains pre-built binaries for Linux x86_64 and MacOS x86_64 (MacOS Big Sur 11 or higher). More pre-compiled targets will be available in the future. Additionally, sqlite-vss is distributed on common package managers like pip for Python and npm for Node.js, see below for details.

Do note that on Linux machines, you'll have to install some packages to make these options work:

sudo apt-get update
sudo apt-get install -y libgomp1 libatlas-base-dev liblapack-dev

Note: The 0 in the filename (vss0.dylib/ vss0.so) denotes the major version of sqlite-vss. Currently sqlite-vss is pre v1, so expect breaking changes in future versions.

Language Install More Info
Python pip install sqlite-vss sqlite-vss with Python PyPI
Datasette datasette install datasette-sqlite-vss sqlite-vss with Datasette Datasette
Node.js npm install sqlite-vss sqlite-vss with Node.js npm
Deno deno.land/x/sqlite_vss sqlite-vss with Deno deno version
Ruby gem install sqlite-vss sqlite-vss with Ruby Gem
Elixir hex.pm/packages/sqlite_vss sqlite-vss with Elixir Hex.pm
Go go get -u github.com/asg017/sqlite-vss/bindings/go sqlite-vss with Go Go Reference
Rust cargo add sqlite-vss sqlite-vss with Rust Crates.io
Github Release GitHub tag (latest SemVer pre-release)

With the sqlite3 CLI

For using sqlite-vss with the official SQLite command line shell, download the vector0.dylib/vss0.dylib (for MacOS Big Sur 11 or higher) or vector0.so/vss0.so (Linux) files from a release and load it into your SQLite environment.

The vector0 extension is a required dependency, so make sure to load that before vss0.

.load ./vector0
.load ./vss0
select vss_version();
-- v0.0.1

Python

For Python developers, install the sqlite-vss package with:

pip install sqlite-vss
import sqlite3
import sqlite_vss

db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
sqlite_vss.load(db)

version, = db.execute('select vss_version()').fetchone()
print(version)

See bindings/python for more details.

Node.js

For Node.js developers, install the sqlite-vss npm package with:

npm install sqlite-vss
import Database from "better-sqlite3"; // also compatible with node-sqlite3
import * as sqlite_vss from "sqlite-ulid";

const db = new Database(":memory:");
sqlite_ulid.load(db);

const version = db.prepare("select vss_version()").pluck().get();
console.log(version);

See npm/sqlite-vss/README.md for more details.

Deno

For Deno developers, use the deno.land/x/sqlite_vss module:

// Requires all permissions (-A) and the --unstable flag

import { Database } from "https://deno.land/x/[email protected]/mod.ts";
import * as sqlite_vss from "https://deno.land/x/sqlite_vss/mod.ts";

const db = new Database(":memory:");

db.enableLoadExtension = true;
sqlite_vss.load(db);

const [version] = db.prepare("select vss_version()").value<[string]>()!;

console.log(version);

See deno/sqlite-vss/README.md for more details.

Datasette

And for Datasette, install the datasette-sqlite-vss plugin with:

datasette install datasette-sqlite-vss

See bindings/datasette for more details.

Disadvantages

  • The underlying Faiss indicies are capped at 1GB. Follow #1 for updates.
  • Additional filtering on top of KNN searches aren't supported yet. Follow #2 for updates.
  • Only CPU Faiss indicies are supported, not GPU yet. Follow #3 for updates.
  • mmap'ed indices aren't supported yet, so indicies have to fit in RAM. Follow #4 for updates.
  • This extension is written in C++ and doesn't have fuzzy testing yet. Follow #5 for updates.
  • UPDATE statements on vss0 virtual tables are not supported, though INSERT and DELETE statements are. Follow #7 for updates.

Supporting

I (Alex πŸ‘‹πŸΌ) spent a lot of time and energy on this project and many other open source projects. If your company or organization uses this library (or you're feeling generous), then please consider sponsoring my work, sharing this project with a friend, or hiring me for contract/consulting work!

See Also

More Repositories

1

sqlite-lines

A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)
C
371
star
2

dataflow

An experimental self-hosted Observable notebook editor, with support for FileAttachments, Secrets, custom standard libraries, and more!
JavaScript
346
star
3

sqlite-html

A SQLite extension for querying, manipulating, and creating HTML elements.
Go
341
star
4

sqlite-loadable-rs

A framework for writing fast and performant SQLite extensions in Rust
Rust
246
star
5

sqlite-http

A SQLite extension for making HTTP requests purely in SQL
Go
173
star
6

sqlite-regex

A fast regular expression SQLite extension, written in Rust
Rust
120
star
7

unofficial-observablehq-compiler

An unofficial compiler for Observable notebook syntax
JavaScript
99
star
8

sqlite-xsv

the fastest CSV SQLite extension, written in Rust
Rust
88
star
9

sqlite-ulid

A SQLite extension for generating and working with ULIDs
Python
79
star
10

streamlit-observable

Embed Observable notebooks into Streamlit apps!
TypeScript
79
star
11

observable-prerender

Pre-render Observable notebooks for automation
JavaScript
55
star
12

sqlite-ecosystem

An overview of all my SQLite extensions, and a roadmap for future extensions and tooling!
TypeScript
53
star
13

sqlite-jsonschema

A SQLite extension for validating JSON objects with JSON Schema
TypeScript
25
star
14

sqlite-geo

A work-in-progress SQLite extension for geospatial data
Rust
24
star
15

sqlite-url

A SQLite extension for parsing, generating, and querying URLs and query strings
C
22
star
16

sqlite-path

A SQLite extension for parsing, generating, and querying paths
C
19
star
17

sqlite-fastrand

A SQLite extension for quickly generating random numbers, booleans, characters, and blobs
Rust
16
star
18

sqlite-vector

A SQLite extension for working with float and binary vectors. Work in progress!
C++
14
star
19

atom-observable

Render Observable notebooks in Atom!
JavaScript
11
star
20

oak

A CLI tool for reproducible, customizable data workflows.
TypeScript
11
star
21

sqlite-base64

Python
8
star
22

sqlite-python

Rust
7
star
23

sqlite-parquet

Rust
6
star
24

sqlite-md

A SQLite extension for parsing, querying, and generating HTML from Markdown documents.
Rust
6
star
25

churro

Access your terminal in an Observable notebook!
JavaScript
5
star
26

sqlite-image

Rust
4
star
27

sqlite-hello

Elixir
4
star
28

TritonFind

A Facebook Messenger Bot that assists UC San Diego students get their lost items back.
JavaScript
4
star
29

sqlite.org-extensions

C
4
star
30

NeverAgainColleges

HTML
3
star
31

sqlite-xml

Rust
3
star
32

ondemand-whisper-fly

Python
3
star
33

streamlit-observable-showcase

An example Streamlit app showcasing the features of the streamlit-observable package.
Python
3
star
34

os-club-constitution

Constitution for a (future) open source club at UCSD
3
star
35

UCSD-Professional-Website-Maker

Make a very professional website for your acsweb.ucsd.edu/~ account
HTML
3
star
36

sqlite-pdf

Rust
2
star
37

unofficial-observable-notebook-react

JavaScript
2
star
38

disney-plus-analysis

Python
2
star
39

har-to-sqlite

TypeScript
2
star
40

shl

A tagged template literal for shell commands! (Work in Progress)
JavaScript
2
star
41

upload-spm

TypeScript
2
star
42

SDHacks-2016---Trivents-Messenger-Bot

Messenger Bot our team made at SDHacks 2016, Winner of the DocuSign Sponsor Prize
JavaScript
1
star
43

sqlite-versions

TypeScript
1
star
44

sqlite-fake

Rust
1
star
45

native-lands-colleges

JavaScript
1
star
46

sqlite-semver

Rust
1
star
47

isthegovernmentshutdown.info

HTML
1
star
48

TritonPal-Chrome-Extension

A Chrome Extension that will help all UC San Diego students.
JavaScript
1
star