• Stars
    star
    324
  • Rank 126,926 (Top 3 %)
  • Language
    Rust
  • License
    MIT License
  • Created 9 months ago
  • Updated 17 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CMU-DB's Cascades optimizer framework

optd

optd (pronounced as op-dee) is a database optimizer framework. It is a cost-based optimizer that searches the plan space using the rules that the user defines and derives the optimal plan based on the cost model and the physical properties.

The primary objective of optd is to explore the potential challenges involved in effectively implementing a cost-based optimizer for real-world production usage. optd implements the Columbia Cascades optimizer framework based on Yongwen Xu's master's thesis. Besides cascades, optd also provides a heuristics optimizer implementation for testing purpose.

The other key objective is to implement a flexible optimizer framework which supports adaptive query optimization (aka. reoptimization) and adaptive query execution. optd executes a query, captures runtime information, and utilizes this data to guide subsequent plan space searches and cost model estimations. This progressive optimization approach ensures that queries are continuously improved, and allows the optimizer to explore a large plan space.

Currently, optd is integrated into Apache Arrow Datafusion as a physical optimizer. It receives the logical plan from Datafusion, implements various physical optimizations (e.g., determining the join order), and subsequently converts it back into the Datafusion physical plan for execution.

optd is a research project and is still evolving. It should not be used in production. The code is licensed under MIT.

Get Started

There are two demos you can run with optd. More information available in the docs.

cargo run --release --bin optd-adaptive-tpch-q8
cargo run --release --bin optd-adaptive-three-join

You can also run the Datafusion cli to interactively experiment with optd.

cargo run --bin datafusion-optd-cli

Documentation

The documentation is available in the mdbook format in the docs directory.

Structure

  • datafusion-optd-cli: The patched Apache Arrow Datafusion (version=32) cli that calls into optd.
  • datafusion-optd-bridge: Implementation of Apache Arrow Datafusion query planner as a bridge between optd and Apache Arrow Datafusion.
  • optd-core: The core framework of optd.
  • optd-datafusion-repr: Representation of Apache Arrow Datafusion plan nodes in optd.
  • optd-adaptive-demo: Demo of adaptive optimization capabilities of optd. More information available in the docs.
  • optd-sqlplannertest: Planner test of optd based on risinglightdb/sqlplannertest-rs.

Related Works

More Repositories

1

bustub

The BusTub Relational Database Management System (Educational)
C++
3,755
star
2

peloton

The Self-Driving Database Management System
C++
2,026
star
3

noisepage

Self-Driving Database Management System from Carnegie Mellon University
C++
1,735
star
4

ottertune

The automatic DBMS configuration tool
Python
1,202
star
5

15445-bootcamp

A basic introduction to coding in modern C++.
C++
601
star
6

dbdb.io

The On-line Database of Databases
Python
465
star
7

benchbase

Multi-DBMS SQL Benchmarking Framework via JDBC
Java
400
star
8

mongodb-d4

Automatic MongoDB database designer
Python
54
star
9

cmdbac

CMDBAC - Carnegie Mellon Database Application Catalog
Python
35
star
10

libfixeypointy

Fixed-Point Decimal Library from Carnegie Mellon University
C++
32
star
11

noisepage-stats

DBMS Performance & Correctness Testing Framework
Python
30
star
12

peloton-design

Peloton Design Docs
27
star
13

peloton-test

SQL Testing Framework for the Peloton DBMS
Java
20
star
14

pgextmgrext

A Postgres Extension to Manage Extensions! (As well as some random stuff)
Rust
13
star
15

15721-s24-cache1

15-721 Spring 2024 - Cache #1
Rust
12
star
16

noisepage-pilot

Because "pilot" was a better name than "brain"
Jupyter Notebook
8
star
17

dbgym

Python
7
star
18

15721-s24-scheduler1

15-721 Spring 2024 - Scheduler #1
Rust
5
star
19

noisepage-control

NoisePage Autonomous Control Plane Infrastructure
4
star
20

terrier-dashboard

JavaScript
4
star
21

noisepage-forecast

Python
4
star
22

pgext-analyzer

PostgreSQL Extensions Analyzer
C++
2
star
23

15721-s24-cache2

15-721 Spring 2024 - Cache #2
Rust
1
star
24

15721-s24-catalog2

15-721 Spring 2024 - Catalog #2
Rust
1
star
25

benchpress

Benchpress Demo (SIGMOD 2015)
JavaScript
1
star
26

noisepage-testfiles

Test Files & Data Sets for NoisePage DBMS Project
TSQL
1
star
27

15721-s24-scheduler2

Rust
1
star