• Stars
    star
    371
  • Rank 111,157 (Top 3 %)
  • Language
    C
  • License
    MIT License
  • Created about 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)

sqlite-lines

sqlite-lines is a SQLite extension for reading lines from a file or blob.

Benchmark between sqlite-lines and various other data processing tools

See Benchmarks for more info.

Usage

.load ./lines0
select line from lines_read('logs.txt');

sqlite-lines is great for line-oriented datasets, like ndjson or JSON Lines, when paired with SQLite's JSON support. Here, we calculate the top 5 country participants in Google's Quick, Draw! dataset for calendars.ndjson:

select
  line ->> '$.countrycode' as countrycode,
  count(*)
from lines_read('./calendar.ndjson')
group by 1
order by 2 desc
limit 5;
/*
┌─────────────┬──────────┐
│ countrycode │ count(*) │
├─────────────┼──────────┤
│ US          │ 141001   │
│ GB          │ 22560    │
│ CA          │ 11759    │
│ RU          │ 9250     │
│ DE          │ 8748     │
└─────────────┴──────────┘
*/

Use the SQLite CLI's fsdir() table functions with lines_read() to read lines from every file in a directory.

select
  name as file,
  lines.rowid as line_number,
  line
from fsdir('logs')
join lines_read(name) as lines
where name like '%.txt';
/*
┌─────────────────────┬──────┐
│ file  │ line_number | line │
├───────┼─────────────┤──────┤
| a.txt | 1           | x    |
| a.txt | 2           | y    |
| a.txt | 3           | z    |
| b.txt | 1           | xx   |
| b.txt | 2           | yy   |
| c.txt | 1           | xxx  |
└───────┴─────────────┴──────┘
*/

Documentation

See docs.md for a full API Reference and detailed documentation.

Installing

Language Install
Python pip install sqlite-lines PyPI
Datasette datasette install datasette-sqlite-lines Datasette
Node.js npm install sqlite-lines npm
Deno deno.land/x/sqlite_lines deno.land/x release
Ruby gem install sqlite-lines Gem
Github Release GitHub tag (latest SemVer pre-release)

The Releases page contains pre-built binaries for Linux amd64 and MacOS (amd64, no arm).

As a loadable extension

If you want to use sqlite-lines as a Runtime-loadable extension, Download the lines0.dylib (for MacOS) or lines0.so file from a release and load it into your SQLite environment.

Note: The 0 in the filename (lines0.dylib or lines0.so) denotes the major version of sqlite-lines. Currently sqlite-lines is pre v1, so expect breaking changes in future versions.

For example, if you are using the SQLite CLI, you can load the library like so:

.load ./lines0
select lines_version();
-- v0.0.1

Or in Python, using the builtin sqlite3 module:

import sqlite3

con = sqlite3.connect(":memory:")

con.enable_load_extension(True)
con.load_extension("./lines0")

print(con.execute("select lines_version()").fetchone())
# ('v0.0.1',)

Or in Node.js using better-sqlite3:

const Database = require("better-sqlite3");
const db = new Database(":memory:");

db.loadExtension("./lines0");

console.log(db.prepare("select lines_version()").get());
// { 'lines_version()': 'v0.0.1' }

Or with Datasette (using the "no filesystem" version to limit security vulnerabilities):

datasette data.db --load-extension ./lines_nofs0

Windows is not supported - yet!

From the browser with WASM/JavaScript

sqlite-lines is also distributed as a standalone SQL.js library. It's essentially a fork of the original SQL.js library, with the addition of sqlite-lines functions like lines_version() and lines().

Check out this Observable notebook for the full demonstration. The Releases page contains the JavaScript and WASM files.

The sqlite-lines CLI

sqlite-lines comes with an example CLI modeled after ndjson-cli that demos the speed and versatility of sqlite-lines. Download a pre-compiled version from the Releases page, or build yourself with:

make cli
./dist/sqlite-lines

The sqlite-lines CLI reads data from stdin and applies transformations with SQL code through its arguments.

The first argument should be a SQL expression that is used transform a single line from stdlin. The available columns are rowid, which is the "line number" that is being processed, and d, an alias for line, which is the text content of the current line (inspired by ndjson-cli). For example, to uppercase every line from a file with upper():

$ cat names.txt | sqlite-lines 'rowid || upper(d)'
1ALEX
2BRIAN
3CRAIG

This includes SQLite's new JSON -> and ->> operators for NDJSON/JSONL files:

$ cat data.ndjson | sqlite-lines 'd ->> "$.id"'
$ cat data.ndjson | sqlite-lines 'json_object("name", d ->> "$.name", "age": d ->> "$.stats.age")'

The second argument is another SQL expression that's used in the WHERE statement of the underlying SQL query to filter out lines.

# get the names of all people older than 40
cat data.ndjson | sqlite-lines 'd ->> "$.name"' 'd ->> "$.age" > 40'

The third argument is another SQL expression that's used in the GROUP BY statement of the underlying SQL query to aggregate lines.

A Note on CSV Parsing

sqlite-lines isn't a great option for CSVs. Technically you can, but the moment your data has a \n character in a field or header, then you'll get corrupted results.

Instead, you should use the "official" CSV Virtual Table, or use the .import command in the SQLite CLI.

More Repositories

1

sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
C++
482
star
2

dataflow

An experimental self-hosted Observable notebook editor, with support for FileAttachments, Secrets, custom standard libraries, and more!
JavaScript
346
star
3

sqlite-html

A SQLite extension for querying, manipulating, and creating HTML elements.
Go
341
star
4

sqlite-loadable-rs

A framework for writing fast and performant SQLite extensions in Rust
Rust
246
star
5

sqlite-http

A SQLite extension for making HTTP requests purely in SQL
Go
173
star
6

sqlite-regex

A fast regular expression SQLite extension, written in Rust
Rust
120
star
7

unofficial-observablehq-compiler

An unofficial compiler for Observable notebook syntax
JavaScript
99
star
8

sqlite-xsv

the fastest CSV SQLite extension, written in Rust
Rust
88
star
9

sqlite-ulid

A SQLite extension for generating and working with ULIDs
Python
79
star
10

streamlit-observable

Embed Observable notebooks into Streamlit apps!
TypeScript
79
star
11

observable-prerender

Pre-render Observable notebooks for automation
JavaScript
55
star
12

sqlite-ecosystem

An overview of all my SQLite extensions, and a roadmap for future extensions and tooling!
TypeScript
53
star
13

sqlite-jsonschema

A SQLite extension for validating JSON objects with JSON Schema
TypeScript
25
star
14

sqlite-geo

A work-in-progress SQLite extension for geospatial data
Rust
24
star
15

sqlite-url

A SQLite extension for parsing, generating, and querying URLs and query strings
C
22
star
16

sqlite-path

A SQLite extension for parsing, generating, and querying paths
C
19
star
17

sqlite-fastrand

A SQLite extension for quickly generating random numbers, booleans, characters, and blobs
Rust
16
star
18

sqlite-vector

A SQLite extension for working with float and binary vectors. Work in progress!
C++
14
star
19

atom-observable

Render Observable notebooks in Atom!
JavaScript
11
star
20

oak

A CLI tool for reproducible, customizable data workflows.
TypeScript
11
star
21

sqlite-base64

Python
8
star
22

sqlite-python

Rust
7
star
23

sqlite-parquet

Rust
6
star
24

sqlite-md

A SQLite extension for parsing, querying, and generating HTML from Markdown documents.
Rust
6
star
25

churro

Access your terminal in an Observable notebook!
JavaScript
5
star
26

sqlite-image

Rust
4
star
27

sqlite-hello

Elixir
4
star
28

TritonFind

A Facebook Messenger Bot that assists UC San Diego students get their lost items back.
JavaScript
4
star
29

sqlite.org-extensions

C
4
star
30

NeverAgainColleges

HTML
3
star
31

sqlite-xml

Rust
3
star
32

ondemand-whisper-fly

Python
3
star
33

streamlit-observable-showcase

An example Streamlit app showcasing the features of the streamlit-observable package.
Python
3
star
34

os-club-constitution

Constitution for a (future) open source club at UCSD
3
star
35

UCSD-Professional-Website-Maker

Make a very professional website for your acsweb.ucsd.edu/~ account
HTML
3
star
36

sqlite-pdf

Rust
2
star
37

unofficial-observable-notebook-react

JavaScript
2
star
38

disney-plus-analysis

Python
2
star
39

har-to-sqlite

TypeScript
2
star
40

shl

A tagged template literal for shell commands! (Work in Progress)
JavaScript
2
star
41

upload-spm

TypeScript
2
star
42

SDHacks-2016---Trivents-Messenger-Bot

Messenger Bot our team made at SDHacks 2016, Winner of the DocuSign Sponsor Prize
JavaScript
1
star
43

sqlite-versions

TypeScript
1
star
44

sqlite-fake

Rust
1
star
45

native-lands-colleges

JavaScript
1
star
46

pypi-datasette-packages

fork fork fork
Shell
1
star
47

sqlite-semver

Rust
1
star
48

isthegovernmentshutdown.info

HTML
1
star
49

TritonPal-Chrome-Extension

A Chrome Extension that will help all UC San Diego students.
JavaScript
1
star