• Stars
    star
    120
  • Rank 285,668 (Top 6 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast regular expression SQLite extension, written in Rust

sqlite-regex

A fast and performant SQLite extension for regular expressions. Based on sqlite-loadable-rs, and the regex crate.

See Introducing sqlite-regex: The fastest Regular Expression Extension for SQLite (Jan 2023) for more details!

If your company or organization finds this library useful, consider supporting my work!

Usage

.load ./regex0
select 'foo' regexp 'f';

Find all occurrences of a pattern in a string

select regex_find(
  '[0-9]{3}-[0-9]{3}-[0-9]{4}',
  'phone: 111-222-3333'
);
-- '111-222-3333'

select rowid, *
from regex_find_all(
  '\b\w{13}\b',
  'Retroactively relinquishing remunerations is reprehensible.'
);
/*
┌───────┬───────┬─────┬───────────────┐
│ rowid │ start │ end │     match     │
├───────┼───────┼─────┼───────────────┤
│ 0     │ 0     │ 13  │ Retroactively │
│ 1     │ 14    │ 27  │ relinquishing │
│ 2     │ 28    │ 41  │ remunerations │
│ 3     │ 45    │ 58  │ reprehensible │
└───────┴───────┴─────┴───────────────┘
*/

Extract capture group values by index or name

select
  regex_capture(captures, 0)        as entire_match,
  regex_capture(captures, 'title')  as title,
  regex_capture(captures, 'year')   as year
from regex_captures(
  regex("'(?P<title>[^']+)'\s+\((?P<year>\d{4})\)"),
  "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931)."
);
/*
┌───────────────────────────┬──────────────────┬──────┐
│       entire_match        │      title       │ year │
├───────────────────────────┼──────────────────┼──────┤
│ 'Citizen Kane' (1941)     │ Citizen Kane     │ 1941 │
│ 'The Wizard of Oz' (1939) │ The Wizard of Oz │ 1939 │
│ 'M' (1931)                │ M                │ 1931 │
└───────────────────────────┴──────────────────┴──────┘
*/

Use RegexSets to match a string on multiple patterns in linear time

select regexset_is_match(
  regexset(
    "bar",
    "foo",
    "barfoo"
  ),
  'foobar'
)

Split the string on the given pattern delimiter

select rowid, *
from regex_split('[ \t]+', 'a b     c d    e');
/*
┌───────┬──────┐
│ rowid │ item │
├───────┼──────┤
│ 0     │ a    │
│ 1     │ b    │
│ 2     │ c    │
│ 3     │ d    │
│ 4     │ e    │
└───────┴──────┘
*/

Replace occurrences of a pattern with another string

select regex_replace(
  '(?P<last>[^,\s]+),\s+(?P<first>\S+)',
  'Springsteen, Bruce',
  '$first $last'
);
-- 'Bruce Springsteen'

select regex_replace_all('a', 'abc abc', '');
-- 'bc bc'

Documentation

See docs.md for a full API reference.

Installing

Language Install
Python pip install sqlite-regex PyPI
Datasette datasette install datasette-sqlite-regex Datasette
Node.js npm install sqlite-regex npm
Deno deno.land/x/sqlite_regex deno.land/x release
Ruby gem install sqlite-regex Gem
Github Release GitHub tag (latest SemVer pre-release)
Rust cargo add sqlite-regex Crates.io

The Releases page contains pre-built binaries for Linux x86_64, MacOS, and Windows.

As a loadable extension

If you want to use sqlite-regex as a Runtime-loadable extension, Download the regex0.dylib (for MacOS), regex0.so (Linux), or regex0.dll (Windows) file from a release and load it into your SQLite environment.

Note: The 0 in the filename (regex0.dylib/ regex0.so/regex0.dll) denotes the major version of sqlite-regex. Currently sqlite-regex is pre v1, so expect breaking changes in future versions.

For example, if you are using the SQLite CLI, you can load the library like so:

.load ./regex0
select regex_version();
-- v0.1.0

Or in Python, using the builtin sqlite3 module:

import sqlite3
con = sqlite3.connect(":memory:")
con.enable_load_extension(True)
con.load_extension("./regex0")
print(con.execute("select regex_version()").fetchone())
# ('v0.1.0',)

Or in Node.js using better-sqlite3:

const Database = require("better-sqlite3");
const db = new Database(":memory:");
db.loadExtension("./regex0");
console.log(db.prepare("select regex_version()").get());
// { 'regex_version()': 'v0.1.0' }

Or with Datasette:

datasette data.db --load-extension ./regex0

Supporting

I (Alex 👋🏼) spent a lot of time and energy on this project and many other open source projects. If your company or organization uses this library (or you're feeling generous), then please consider supporting my work, or share this project with a friend!

See also

More Repositories

1

sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
C++
482
star
2

sqlite-lines

A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)
C
371
star
3

dataflow

An experimental self-hosted Observable notebook editor, with support for FileAttachments, Secrets, custom standard libraries, and more!
JavaScript
346
star
4

sqlite-html

A SQLite extension for querying, manipulating, and creating HTML elements.
Go
341
star
5

sqlite-loadable-rs

A framework for writing fast and performant SQLite extensions in Rust
Rust
246
star
6

sqlite-http

A SQLite extension for making HTTP requests purely in SQL
Go
173
star
7

unofficial-observablehq-compiler

An unofficial compiler for Observable notebook syntax
JavaScript
99
star
8

sqlite-xsv

the fastest CSV SQLite extension, written in Rust
Rust
88
star
9

sqlite-ulid

A SQLite extension for generating and working with ULIDs
Python
79
star
10

streamlit-observable

Embed Observable notebooks into Streamlit apps!
TypeScript
79
star
11

observable-prerender

Pre-render Observable notebooks for automation
JavaScript
55
star
12

sqlite-ecosystem

An overview of all my SQLite extensions, and a roadmap for future extensions and tooling!
TypeScript
53
star
13

sqlite-jsonschema

A SQLite extension for validating JSON objects with JSON Schema
TypeScript
25
star
14

sqlite-geo

A work-in-progress SQLite extension for geospatial data
Rust
24
star
15

sqlite-url

A SQLite extension for parsing, generating, and querying URLs and query strings
C
22
star
16

sqlite-path

A SQLite extension for parsing, generating, and querying paths
C
19
star
17

sqlite-fastrand

A SQLite extension for quickly generating random numbers, booleans, characters, and blobs
Rust
16
star
18

sqlite-vector

A SQLite extension for working with float and binary vectors. Work in progress!
C++
14
star
19

atom-observable

Render Observable notebooks in Atom!
JavaScript
11
star
20

oak

A CLI tool for reproducible, customizable data workflows.
TypeScript
11
star
21

sqlite-base64

Python
8
star
22

sqlite-python

Rust
7
star
23

sqlite-parquet

Rust
6
star
24

sqlite-md

A SQLite extension for parsing, querying, and generating HTML from Markdown documents.
Rust
6
star
25

churro

Access your terminal in an Observable notebook!
JavaScript
5
star
26

sqlite-image

Rust
4
star
27

sqlite-hello

Elixir
4
star
28

TritonFind

A Facebook Messenger Bot that assists UC San Diego students get their lost items back.
JavaScript
4
star
29

sqlite.org-extensions

C
4
star
30

NeverAgainColleges

HTML
3
star
31

sqlite-xml

Rust
3
star
32

ondemand-whisper-fly

Python
3
star
33

streamlit-observable-showcase

An example Streamlit app showcasing the features of the streamlit-observable package.
Python
3
star
34

os-club-constitution

Constitution for a (future) open source club at UCSD
3
star
35

UCSD-Professional-Website-Maker

Make a very professional website for your acsweb.ucsd.edu/~ account
HTML
3
star
36

sqlite-pdf

Rust
2
star
37

unofficial-observable-notebook-react

JavaScript
2
star
38

disney-plus-analysis

Python
2
star
39

har-to-sqlite

TypeScript
2
star
40

shl

A tagged template literal for shell commands! (Work in Progress)
JavaScript
2
star
41

upload-spm

TypeScript
2
star
42

SDHacks-2016---Trivents-Messenger-Bot

Messenger Bot our team made at SDHacks 2016, Winner of the DocuSign Sponsor Prize
JavaScript
1
star
43

sqlite-versions

TypeScript
1
star
44

sqlite-fake

Rust
1
star
45

native-lands-colleges

JavaScript
1
star
46

sqlite-semver

Rust
1
star
47

isthegovernmentshutdown.info

HTML
1
star
48

TritonPal-Chrome-Extension

A Chrome Extension that will help all UC San Diego students.
JavaScript
1
star