• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SQLwrite: AI in your DBMS! Automatically converts natural language queries to SQL.

SQLwrite

by Emery Berger

Integrates AI into your database: automatically converts natural language queries into SQL, and then runs the SQL query. As far as we are aware, this is the first integration of LLMs to enable natural language queries into a production database manager. Currently works as an extension to SQLite3 (more to come). In addition to generating queries, SQLwrite also produces suggestions to improve query performance (e.g., creating new indices).

NOTE: To use SQLwrite, you must first set up an OpenAI API key. If you already have an API key, you can set it as an environment variable called OPENAI_API_KEY. If you do not have one yet, you can get a key here: https://platform.openai.com/account/api-keys

export OPENAI_API_KEY=<your-api-key>

Examples

These example queries use a large SQLite database with multiple tables:

Getting started

% export OPENAI_API_KEY=<your-api-key>
% ./sqlite3 Chinook_Sqlite.sqlite
sqlite> .load sqlwrite
SQLwrite extension successfully initialized. You can now use natural language queries like "select ask('show me all artists.');".
Please report any issues to https://github.com/plasma-umass/sqlwrite/issues/new

Basic queries

sqlite> select ask('show me the total invoiced for all artists.');
2328.6
[SQLwrite] translation to SQL: SELECT SUM(Total) AS total_invoiced FROM Invoice;
[SQLwrite] indexing suggestions to improve the performance for this query:
(1): CREATE INDEX idx_invoice_total ON Invoice (Total);
(2): CREATE INDEX idx_customer_invoice ON Invoice (CustomerId);

Queries with JOINs

sqlite> select ask('show me the total invoiced for all artists whose last name starts with "S"');
306.98
[SQLwrite] translation to SQL: SELECT sum(Invoice.Total) as total_invoiced FROM Invoice JOIN Customer ON Invoice.CustomerId = Customer.CustomerId WHERE Customer.LastName LIKE 'S%'
[SQLwrite] indexing suggestions to improve the performance for this query:
(1): CREATE INDEX idx_invoice_customer_lastname ON Invoice (CustomerId) WHERE (CustomerId IN (SELECT CustomerId FROM Customer WHERE LastName LIKE 'S%'))
(2): CREATE INDEX idx_customer_lastname ON Customer (LastName)

Complex query synthesis with multiple JOINs

sqlite> select ask('give me a list of all artists (with no duplicates) whose genre is reggae');
Cidade Negra
Lenny Kravitz
UB40
[SQLwrite] translation to SQL: SELECT DISTINCT Artist.Name FROM Artist JOIN Album ON Album.ArtistId = Artist.ArtistId JOIN Track ON Track.AlbumId = Album.AlbumId JOIN Genre ON Track.GenreId = Genre.GenreId WHERE Genre.Name = 'Reggae';
[SQLwrite] indexing suggestions to improve the performance for this query:
(1): CREATE INDEX [IFK_TrackGenreReggae] ON [Track] ([GenreId]) WHERE Name = 'Reggae';

Natural languages besides English!

sqlite> select ask('Haz una lista de todos los músicos cuyos nombres empiezan con la letra L');
22|Led Zeppelin
33|Luiz Melodia
99|Legião Urbana
100|Lenny Kravitz
101|Lulu Santos
149|Lost
162|Los Lonely Boys
187|Los Hermanos
201|Luciana Souza/Romero Lubambo
223|London Symphony Orchestra & Sir Charles Mackerras
227|Luciano Pavarotti
228|Leonard Bernstein & New York Philharmonic
258|Les Arts Florissants & William Christie
[SQLwrite] translation to SQL: SELECT * FROM Artist WHERE Name LIKE 'L%';
[SQLwrite] indexing suggestions to improve the performance for this query:
(1): CREATE INDEX [IFK_ArtistName] ON [Artist] ([Name]);

Installation

Download and run make. Currently Mac and Linux only. You may need to install the curl library.

Ubuntu

sudo apt install libcurl4-gnutls-dev

Usage

Either use the built-in SQLite (if it was built to allow extensions), or run the generated sqlite3 file on your database:

./sqlite3 my_database.db

and then run the following command to load the SQLwrite extension:

select load_extension("the_path_to_your_sqlwrite_directory/sqlwrite");

or:

.load the_path_to_your_sqlwrite_directory/sqlwrite

You can now issue English language queries by using the ask function:

SELECT ask('(whatever you want)');

Acknowledgements

SQLwrite includes SQLite3 (https://www.sqlite.org/index.html), and is built with the assistance of several excellent libraries, whose code is (for now) included in this repository:

More Repositories

1

scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Python
12,131
star
2

coz

Coz: Causal Profiling
C
4,024
star
3

browsix

Browsix is a Unix-like operating system for the browser.
JavaScript
3,149
star
4

doppio

Breaks the browser language barrier (includes a plugin-free JVM).
TypeScript
2,150
star
5

Mesh

A memory allocator that automatically reduces the memory footprint of C/C++ applications.
C++
1,618
star
6

ChatDBG

ChatDBG - AI-assisted debugging. Uses AI to answer 'why'
C++
772
star
7

slipcover

Near Zero-Overhead Python Code Coverage
Python
485
star
8

BLeak

BLeak: Automatically Debugging Memory Leaks in Web Applications
TypeScript
408
star
9

cwhy

"See why!" Explains and suggests fixes for compile-time errors for C, C++, C#, Go, Java, LaTeX, PHP, Python, Ruby, Rust, and TypeScript
C++
272
star
10

NextDoor

Graph Sampling using GPU
Cuda
49
star
11

DataDebug

Excel 2010/2013 add-in that automatically finds errors in spreadsheets
C#
46
star
12

coverup

Automatic AI-powered test suite generator
Python
37
star
13

systemgo

Init system in Go, intended to run on Browsix and other Unix-like OS. Part of GSoC 2016 project.
Go
36
star
14

sheriff

Sheriff consists of two tools: Sheriff-Detect, a false-sharing detector, and Sheriff-Protect, a false-sharing eliminator that you can link with your code to eliminate false sharing.
C++
29
star
15

DoubleTake

Evidence-based dynamic analysis: a fast checker for memory errors.
C
21
star
16

commentator

Automatically comments Python code, adding docstrings and type annotations, with optional translation to other languages.
Python
20
star
17

Predator

Predator: Predictive False Sharing Detection
C
19
star
18

memory-landscape

The space of memory management research and systems produced by the PLASMA lab (https://plasma-umass.org).
16
star
19

snakefish

parallel Python
Python
13
star
20

entroprise

measure entropy of memory allocators
C++
12
star
21

parcel

An Excel formula parser
C#
12
star
22

Rehearsal

Rehearsal: A Configuration Verification Tool for Puppet
Scala
12
star
23

Hound

Hound memory leak detector
C++
11
star
24

smash-project

Smash compressing allocator project
C++
10
star
25

browsix-spec

JavaScript
9
star
26

Archipelago

Archipelago memory allocator
C
8
star
27

simplesocket

A simple socket wrapper for C++.
C++
8
star
28

pythoness

Pythoness: use natural language to define Python functions.
Python
7
star
29

compsci631

Support code for Programming Languages (COMPSCI631)
OCaml
7
star
30

Tortoise

Tortoise: Interactive System Configuration Repair
Scala
6
star
31

scalene-gui

Scalene web GUI
JavaScript
5
star
32

llm-utils

Utilities for our LLM projects (CWhy, ChatDBG, ...).
Python
5
star
33

transparentFS

TransparentFS code, paper, and slides
C
5
star
34

homebrew-scalene

Homebrew tap for Scalene (emeryberger/scalene)
Ruby
4
star
35

GSoC

Description of our Google Summer of Code projects for 2015
4
star
36

vam

Implementation from "A Locality-Improving Dynamic Memory Allocator", Feng and Berger, MSP 2005
C++
4
star
37

HeapToss

HeapToss is an LLVM compiler pass that moves stack variables that may escape their declaring function's context into the heap.
3
star
38

pytest-cleanslate

Python
3
star
39

jsvm

JavaScript
2
star
40

GSoC-2013

Google Summer of Code 2013
2
star
41

plasma-umass.github.io

home page
HTML
2
star
42

spl

Rust
2
star
43

doppio_jcl

Scripts that produce a version of the Java Class Library and Java Home in a way that is compatible with DoppioJVM.
TypeScript
2
star
44

nextdoor-eurosys21

HTML
1
star
45

mesh-testsuite

C
1
star
46

ChatSheet

Python
1
star
47

custom-public

Jupyter Notebook
1
star
48

proto

probabilistic race tolerance
C
1
star
49

wasm-gc-template

C++
1
star
50

typissed

Generates MTurk typo jobs
C#
1
star
51

scalene-benchmarks

Benchmarks comparing Scalene with other commonly-used profilers
Python
1
star
52

emcc_control

C
1
star
53

transparentMM

Transparent memory management
1
star