LearnDB
What I Cannot Create, I Do Not Understand -Richard Feynman
In the spirit of Feynman's immortal words, the goal of this project is to better understand the internals of databases by implementing a relational database management system (RDBMS) (sqlite clone) from scratch.
This project was motivated by a desire to: 1) understand databases more deeply and 2) work on a fun project. These dual goals led to a:
- relatively simple code base
- relatively complete RDBMS implementation
- written in pure python
- No build step
- zero configuration
- configuration can be overriden
This makes the learndb codebase great for tinkering with. But the product has some key limitations that means it shouldn't be used as an actual storage solution.
Features
Learndb supports the following:
- it has a rich sql (learndb-sql) with support for
select, from, where, group by, having, limit, order by
- custom lexer and parser built using
lark
- at a high-level, there is an engine that can accept some SQL statements. These statements expresses operations on a database (a collection of tables which contain data)
- allows users/agents to connect to RDBMS in multiple ways:
- REPL
- importing python module
- passing a file of commands to the engine
- on-disk btree implementation as backing data structure
Limitations
- Very simplified 1 implementation of floating point number arithmetic, e.g. compared to IEEE754).
- No support for common utility features, like wildcard column expansion, e.g.
select * ...
- More limitations
Getting Started: Tinkering and Beyond
- To get started with
learndb
first start withtutorial.md
. - Then to understand the system at a deeper technical level read
reference.md
. This is essentially a complete reference manual directed at a user of the system. This outlines the operations and capabilities of the system. It also describes what is (un)supported and undefined behavior. - `Architecture.md`` - this provides a component level breakdown of the repo and the system
Hacking
Install
- System requirements
- requires a linux/macos system, since it uses
fcntl
to get exclusive read access on database file - python >= 3.9
- requires a linux/macos system, since it uses
- To install for development, i.e. src can be edited from without having to reinstall:
cd <repo_root>
- create virtualenv:
python3 -m venv venv
- activate venv:
source venv/bin/activate
- install requirements:
python -m pip install -r requirements.txt
- install
Learndb
in edit mode:python3 -m pip install -e .
Run REPL
source venv/bin/activate
python run_learndb.py repl
Run Tests
-
Run all tests:
-
python -m pytest tests/*.py
-
Run btree tests: -
python -m pytest -s tests/btree_tests.py
# stdout -
python -m pytest tests/btree_tests.py
# suppressed out -
Run end-to-end tests:
python -m pytest -s tests/e2e_tests.py
-
Run end-to-end tests (employees):
python -m pytest -s tests/e2e_tests_employees.py
python -m pytest -s tests/e2e_tests_employees.py -k test_equality_select
-
Run serde tests:
... serde_tests.py
-
Run language parser tests:
... lang_tests.py
-
Run specific test:
python -m pytest tests.py -k test_name
-
Clear pytest cache
python -m pytest --cache-clear
References consulted
- I started this project by following cstack's awesome tutorial
- Later I was primarily referencing: SQLite Database System: Design and Implementation (1st ed)
- Sqlite file format: docs
- Postgres for how certain SQL statements are implemented and how their documentation is organized
Project Management
- immanent work/issues are tracked in
tasks.md
- long-term ideas are tracked in
docs/future-work.md
Footnotes
-
When evaluating the difference between two floats, e.g.
3.2 > 4.2
, I consider the condition True if the difference between the two is some fixed delta. The accepted epsilon should scale with the magnitude of the number β©