• Stars
    star
    3,469
  • Rank 12,843 (Top 0.3 %)
  • Language
    TypeScript
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hosting read-only SQLite databases on static file hosters like Github Pages

sql.js-httpvfs

See my blog post for an introduction: https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

sql.js is a light wrapper around SQLite compiled with EMScripten for use in the browser (client-side).

This repo is a fork of and wrapper around sql.js to provide a read-only HTTP-Range-request based virtual file system for SQLite. It allows hosting an SQLite database on a static file hoster and querying that database from the browser without fully downloading it.

The virtual file system is an emscripten filesystem with some "smart" logic to accelerate fetching with virtual read heads that speed up when sequential data is fetched. It could also be useful to other applications, the code is in lazyFile.ts. It might also be useful to implement this lazy fetching as an SQLite VFS since then SQLite could be compiled with e.g. WASI SDK without relying on all the emscripten OS emulation.

Note that this whole thing only works well if your database and indexes are structured well.

sql.js-httpvfs also provides a proof-of-concept level implementation of a DOM virtual table that allows interacting (read/write) with the browser DOM directly from within SQLite queries.

Usage

(optional) First, improve your SQLite database:

-- first, add whatever indices you need. Note that here having many and correct indices is even more important than for a normal database.
pragma journal_mode = delete; -- to be able to actually set page size
pragma page_size = 1024; -- trade off of number of requests that need to be made vs overhead. 

insert into ftstable(ftstable) values ('optimize'); -- for every FTS table you have (if you have any)

vacuum; -- reorganize database and apply changed page size

(optional) Second, split the database into chunks and generate a json config using the create_db.sh script. This is needed if your hoster has a maximum file size. It can also be a good idea generally depending on your CDN since it allows selective CDN caching of the chunks your users actually use and reduces cache eviction.

Finally, install sql.js-httpvfs from npm and use it in TypeScript / JS!

Here's an example for people familiar with the JS / TS world. At the bottom of this readme there's a more complete example for those unfamiliar.

import { createDbWorker } from "sql.js-httpvfs"

// sadly there's no good way to package workers and wasm directly so you need a way to get these two URLs from your bundler.
// This is the webpack5 way to create a asset bundle of the worker and wasm:
const workerUrl = new URL(
  "sql.js-httpvfs/dist/sqlite.worker.js",
  import.meta.url,
);
const wasmUrl = new URL(
  "sql.js-httpvfs/dist/sql-wasm.wasm",
  import.meta.url,
);
// the legacy webpack4 way is something like `import wasmUrl from "file-loader!sql.js-httpvfs/dist/sql-wasm.wasm"`.

// the config is either the url to the create_db script, or a inline configuration:
const config = {
  from: "inline",
  config: {
    serverMode: "full", // file is just a plain old full sqlite database
    requestChunkSize: 4096, // the page size of the  sqlite database (by default 4096)
    url: "/foo/bar/test.sqlite3" // url to the database (relative or full)
  }
};
// or:
const config = {
  from: "jsonconfig",
  configUrl: "/foo/bar/config.json"
}


let maxBytesToRead = 10 * 1024 * 1024;
const worker = await createDbWorker(
  [config],
  workerUrl.toString(),
  wasmUrl.toString(),
  maxBytesToRead // optional, defaults to Infinity
);
// you can also pass multiple config objects which can then be used as separate database schemas with `ATTACH virtualFilename as schemaname`, where virtualFilename is also set in the config object.


// worker.db is a now SQL.js instance except that all functions return Promises.

const result = await worker.db.exec(`select * from table where id = ?`, [123]);

// worker.worker.bytesRead is a Promise for the number of bytes read by the worker.
// if a request would cause it to exceed maxBytesToRead, that request will throw a SQLite disk I/O error.
console.log(await worker.worker.bytesRead);

// you can reset bytesRead by assigning to it:
worker.worker.bytesRead = 0;

Cachebusting

Alongside the url or urlPrefix, config can take an optional cacheBust property whose value will be appended as a query parameter to URLs. If you set it to a random value when you update the database you can avoid caching-related database corruption.

If using a remote config (from: 'jsonconfig'), don't forget to cachebust that too.

Debugging data fetching

If your query is fetching a lot of data and you're not sure why, try this:

  1. Look at the output of explain query plan select ......

    • SCAN TABLE t1 means the table t1 will have to be downloaded pretty much fully
    • SCAN TABLE t1 USING INDEX i1 (a=?) means direct index lookups to find a row, then table lookups by rowid
    • SCAN TABLE t1 USING COVERING INDEX i1 (a) direct index lookup without a table lookup. This is the fastest.

    You want all the columns in your WHERE clause that significantly reduce the number of results to be part of an index, with the ones reducing the result count the most coming first.

    Another useful technique is to create an index containing exactly the rows filtered by and the rows selected, which SQLite reads as a COVERING INDEX in a sequential manner (no random access at all!). For example create index i1 on tbl (filteredby1, filteredby2, selected1, selected2, selected3). This index is perfect for a query filtering by the filteredby1 and filteredby2 columns that only select the three columns at the back of the index.

  2. You can look at the dbstat virtual table to find out exactly what the pages SQLite is reading contain. For example, if you have [xhr of size 1 KiB @ 1484048 KiB] in your logs that means it's reading page 1484048. You can get the full log of read pages by using worker.getResetAccessedPages(). Check the content of pages with select * from dbstat where pageno = 1484048. Do this in an SQLite3 shell not the browser because the dbstat vtable reads the whole database.

Is this production ready?

Note that this library was mainly written for small personal projects of mine and as a demonstration. I've received requests from many people for applications that are out of the scope of this library for me (Which is awesome, and I'm happy to have inspired so many interesting new idea).

In general it works fine, but I'm not making any effort to support older or weird browsers. If the browser doesn't support WebAssembly and WebWorkers, this won't work. There's also no cache eviction, so the more data is fetched the more RAM it will use. Most of the complicated work is done by SQLite, which is well tested, but the virtual file system part doesn't have any tests.

If you want to build something new that doesn't fit with this library exactly, I'd recommend you look into these discussions and libraries:

  • The general virtual file system discussion here: sql-js/sql.js#447
  • wa-sqlite, which is a much simpler wasm wrapper for SQLite than sql.js a and has different VFSes that don't require an EMScripten dependency. sql.js-httpvfs could easily be reimplemented on top of this.
  • absurd-sql, which is an implementation of a pretty efficient VFS that allows persistence / read/write queries by storing the DB in IndexedDB

Inspiration

This project is inspired by:

The original code of lazyFile is based on the emscripten createLazyFile function, though not much of that code is remaining.

Minimal example from scratch

Here's an example of how to setup a project with sql.js-httpvfs completely from scratch, for people unfamiliar with JavaScript or NPM in general.

First, You will need node and npm. Get this from your system package manager like apt install nodejs npm.

Then, go to a new directory and add a few dependencies:

mkdir example
cd example
echo '{}' > package.json
npm install --save-dev webpack webpack-cli typescript ts-loader http-server
npm install --save sql.js-httpvfs
npx tsc --init

Edit the generated tsconfig.json file to make it more modern:

...
"target": "es2020",
"module": "es2020",
"moduleResolution": "node",
...

Create a webpack config, minimal index.html file and TypeScript entry point:

Finally, create a database:

sqlite3 example.sqlite3 "create table mytable(foo, bar)"
sqlite3 example.sqlite3 "insert into mytable values ('hello', 'world')"

and build the JS bundle and start a webserver:

./node_modules/.bin/webpack --mode=development
./node_modules/.bin/http-server

Then go to http://localhost:8080

And you should see the output to the query select * from mytable.

[{"foo":"hello","bar":"world"}]

The full code of this example is in example/.

Compiling

To compile this project (only needed if you want to modify the library itself), make sure you have emscripten, then first compile sql.js, then sql.js-httpvfs:

cd sql.js
yarn build
cd ..
yarn build

More Repositories

1

ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Rust
8,147
star
2

sqlite-zstd

Transparent dictionary-based row-level compression for SQLite
Rust
1,447
star
3

blog

Source code of my personal blog
TypeScript
339
star
4

world-development-indicators-sqlite

Python
152
star
5

procedural-cities

Information about procedural city generation
TeX
128
star
6

nmap-log-parse

Logs which devices are in your local network and draws graphs
TypeScript
127
star
7

pandoc-url2cite

Effortlessly and transparently add correctly styled citations to your markdown paper given only a URL
TypeScript
122
star
8

youtube-sponsorship-stats

TypeScript
121
star
9

timetrackrs

An automated time tracker (WIP)
Rust
113
star
10

tv-show-ratings

Compare the episode ratings of TV shows
TypeScript
75
star
11

neural-network-demo

Demonstration and visualization of feed-forward neural networks running in the browser
TypeScript
60
star
12

tuxguitar

unofficial mirror from sourceforge svn
Java
57
star
13

backchannel-prediction

Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Python
54
star
14

tantivy-wasm

TypeScript
42
star
15

fbstats

generate facebook messaging statistics
TypeScript
38
star
16

levenshtein-demo

TypeScript
35
star
17

typed-socket.io

A library for fully typed client-server communication with socket.io and TypeScript.
TypeScript
30
star
18

encrypted-gist

Storing files in github gists, with client side transparent encryption and authentication
TypeScript
28
star
19

webrtc-remote-touch-pen-input

JavaScript
24
star
20

ts-histdbimport

Imports a `.zsh_history` file into a https://github.com/larkery/zsh-histdb sqlite database.
JavaScript
24
star
21

convolution-demo

Visualization of convolution on pairs of simple functions
TypeScript
18
star
22

thought-forge-ai

Generate 30-60 second "deep thought" TikTok-style video including a monologue, moving video scenes, music, and subtitles.
TypeScript
16
star
23

ebv-theta-to-mqtt

Perl
15
star
24

emojidome

Interactive viewer of the results of the Emojidome XKCD
TypeScript
14
star
25

dupegone

small fast duplicate file finder in c++
C++
11
star
26

redis-remotify

A tiny TypeScript library for fully typed remote calls via Redis.
TypeScript
9
star
27

mima

Mima Compiler and Interpreter for the browser
JavaScript
9
star
28

Gelddruckmaschine

Telegram bot that finds arbitrage opportunities between multiple crypto trading sites to print money mostly risk-free
JavaScript
8
star
29

warc-sqlite

POC of converting a set of WARC web archive files to a SQLite database and querying it
Python
7
star
30

phiresky.github.io

Lazy Github homepage generator
TypeScript
7
star
31

deep-intellisense

IntelliSense based on deep learning using char-rnn
TypeScript
6
star
32

endoh1-ts

Deobfuscating the ASCII fluid simulater and converting it to TypeScript
JavaScript
6
star
33

lemmy-federation-state

Quick visualization of the lemmy federation state
TypeScript
6
star
34

youtube-watch-history-parse

TypeScript
5
star
35

rpi-autousbupload

Automatically uploads photos from usb devices to an ftp server. Optimized for robustness on an Raspberry Pi.
Python
5
star
36

dfa2regex

Converts Deterministic finite automata to regular expressions.
JavaScript
5
star
37

minecraft-stats

Parses Vanilla Minecraft Server Statistics and displays graphs
JavaScript
4
star
38

fix-messy-movie-folder

Try to identify movie files in messy folders and sort them into a predefined structure
Java
4
star
39

rust-brotli-wasm

Two experiments of the brotli encoder / decoder compiled from Rust to WebAssembly.
Rust
4
star
40

RobinHood-TheLegendOfSherwood-Resolution-Patcher

Patch the resolution of Robin Hood: The Legend of Sherwood to support any resolution (e.g. 1920x1080)
C++
4
star
41

socket.io-distributor

Simple load leveling for socket.io
TypeScript
3
star
42

portfolio-bot

Telegram Bot that allows you to get information on your investments (stocks, ETFs, etc)
TypeScript
3
star
43

coronavirus-reproduction-analysis

Jupyter Notebook
3
star
44

qalc-react

Unit Calculator
TypeScript
3
star
45

gaussian-mixtures-demo

TypeScript
2
star
46

pythoven

Python
2
star
47

joint-multilingual-speech-recognition-and-language-id

TeX
2
star
48

tree-magic-cli

Rust
2
star
49

how-long-am-i-working

use google location history to graph how much time you've spent at a location (e.g. at work)
TypeScript
2
star
50

pingplot

plot a ping log
R
2
star
51

csv-cooccurrence-graph

Rust
2
star
52

RaspberrySmartScaleReceiver

Program to receive the body weight and body composition analysis data from a Soehnle 63760 BB smart scale using a 433 Mhz receiver on the Raspberry Pi.
C++
2
star
53

phiresky

1
star
54

rga-windows-test

Rust
1
star
55

bachelor-thesis

TeX
1
star
56

bayesian-aggregation-for-swarm-reinforcement-learning

Python
1
star
57

score-voting-tool

Rust
1
star
58

masters-thesis

HTML
1
star
59

algo2-summary

compact summary of stuff from the algorithms 2 lecture. Zusammenfassung der Algorithmen II Vorlesung am KIT
TeX
1
star
60

ocr-pdf-via-document-ai

OCR a set of images via the Google Cloud API
JavaScript
1
star
61

plangraph-impl

Implementation of some algorithms for planar graphs
JavaScript
1
star
62

guitar-tabs

Static hosted version of the guitar tabs portion of my old (<2010) website
CSS
1
star
63

plangraph

Mitschrieb zur Vorlesung planare Graphen KIT SS2015
Python
1
star
64

kogsys-demos

overview page for all the lecture demonstrations
HTML
1
star
65

ts-boilerplates

just some simple boilerplates for me to use
TypeScript
1
star
66

nushell-history-skim

Rust
1
star
67

prosem-proto

Simple procedural town generation (for Proseminar)
TypeScript
1
star
68

fourier-series-demo

TypeScript
1
star