• Stars
    star
    4,927
  • Rank 8,133 (Top 0.2 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast web applications through dynamic, partially-stateful dataflow

Noria: data-flow for high-performance web applications

noria on crates.io noria on docs.rs noria-server on crates.io noria-server on docs.rs Azure Status

Noria is a new streaming data-flow system designed to act as a fast storage backend for read-heavy web applications based on Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18. It acts like a database, but precomputes and caches relational query results so that reads are blazingly fast. Noria automatically keeps cached results up-to-date as the underlying data, stored in persistent base tables, change. Noria uses partially-stateful data-flow to reduce memory overhead, and supports dynamic, runtime data-flow and query change.

Noria comes with a MySQL adapter that implements the binary MySQL protocol. This lets any application that currently talks to MySQL or MariaDB switch to Noria with minimal effort. For example, running a Lobsters-like workload that issues the equivalent SQL queries to the real Lobsters website, Noria improves throughput supported by 5x:

Noria speeds up Lobsters queries by 5x

At a high level, Noria takes a set of parameterized SQL queries (think prepared statements), and produces a data-flow program that maintains materialized views for the output of those queries. Reads now become fast lookups directly into these materialized views, as if the value had been directly cached in memcached. The views are then kept up-to-date incrementally through the data-flow, which yields high write throughput.

Running Noria

Like most databases, Noria follows a server-client model where many clients connect to a (potentially distributed) server. The server in this case is the noria-server binary, and must be started before clients can connect. Noria also uses Apache ZooKeeper to announce the location of its servers, so ZooKeeper must be running.

You (currently) need nightly Rust to build noria-server. This will be arranged for automatically if you're using rustup.rs. To build noria-server, run

$ cargo build --release --bin noria-server

You may need to install some dependencies for the above to work:

  • clang
  • libclang-dev
  • libssl-dev
  • liblz4-dev
  • build-essential

To start a long-running noria-server instance, ensure that ZooKeeper is running, and then run:

$ cargo r --release --bin noria-server -- --deployment myapp --no-reuse --address 172.16.0.19 --shards 0

myapp here is a deployment. Many noria-server instances can operate in a single deployment at the same time, and will share the workload between them. Workers in the same deployment automatically elect a leader and discovery each other via ZooKeeper.

Interacting with Noria

There are two primary ways to interact with Noria: through the Rust bindings or through the MySQL adapter. They both automatically locate the running worker through ZooKeeper (use -z if ZooKeeper is not running on localhost:2181).

Rust bindings

The noria crate provides native Rust bindings to interact with noria-server. See the noria documentation for detailed instructions on how to use the library. You can also take a look at the example Noria program using Noria's client API. You can also see a self-contained version that embeds noria-server (and doesn't require ZooKeeper) in this example.

MySQL adapter

We have built a MySQL adapter for Noria that accepts standard MySQL queries and speaks the MySQL protocol to make it easy to try Noria out for existing applications. Once the adapter is running (see its README), you should be able to point your application at localhost:3306 to send queries to Noria. If your application crashes, this is a bug, and we would appreciate it if you open an issue. You may also want to try to disable automatic re-use (with --no-reuse) or sharding (with --shards 0) in case those are misbehaving.

CLI and Web UI

You can manually inspect the data stored in Noria using any MySQL client (e.g., the mysql CLI), or use Noria's own web interface.

Noria development

Noria is a large piece of software that spans many sub-crates and external tools (see links in the text above). Each sub-crate is responsible for a component of Noria's architecture, such as external API (noria), mapping SQL to data-flow (server/mir), and executing data-flow operators (server/dataflow). The code in server/src/ is the glue that ties these pieces together by establishing materializations, scheduling data-flow work, orchestrating Noria program changes, handling failovers, etc.

server/src/lib.rs has a pretty extensive comment at the top of it that goes through how the Noria internals fit together at an implementation level. While it occasionally lags behind, especially following larger changes, it should serve to get you familiarized with the basic building blocks relatively quickly.

The sub-crates each serve a distinct role:

  • noria/: everything that an external program communicating with Noria needs. This includes types used in RPCs as arguments/return types, as well as code for discovering Noria workers through ZooKeeper, establishing a connection to Noria through ZooKeeper, and invoking the various RPC exposed by the Noria controller (server/src/controller.rs). The noria sub-crate also contains a number of internal data-structures that must be shared between the client and the server like DataType (Noria's "value" type). These are annotated with #[doc(hidden)], and should be easy to spot in noria/src/lib.rs.

  • applications/: a collection of various Noria benchmarks. The most frequently used one is vote, which runs the vote benchmark from Β§8.2 of the OSDI paper. You can run it in a bunch of different ways (--help should be useful), and with many different backends. The localsoup backend is the one that's easiest to get up and running with.

  • server/src/: the Noria server, including high-level components such as RPC handling, domain scheduling, connection management, and all the controller operations (listening for heartbeats, handling failed workers, etc.). It contains two notable sub-crates:

    • dataflow/: the code that implements the internals of the data-flow graph. This includes implementations of the different operators (ops/), "special" operators like leaf views and sharders (node/special/), implementations of view storage (state/), and the code that coordinates execution of control, data, and backfill messages within a thread domain (domain/).
    • mir/: the code that implements Noria's SQL-to-dataflow mapping. This includes resolving columns and keys, creating dataflow operators, and detecting reuse opportunities, and triggering migrations to make changes after new SQL queries have been added. @ms705 is the primary author of this particular subcrate, and it builds largely upon nom-sql.
    • common/: data-structures that are shared between the various server sub-crates.

To run the test suite, use:

$ cargo test

Build and open the documentation with:

$ cargo doc --open

Once noria-server is running, its API is available on port 6033 at the specified listen address.

Alternatively, you can discover Noria's REST API listen address and port through ZooKeeper via this command:

$ cargo run --bin noria-zk -- \
    --show --deployment myapp
    | grep external | cut -d' ' -f4

A basic graphical UI runs at http://IP:PORT/graph.html and shows the running data-flow graph. You can also deploy Noria's more advanced web UI that serves the REST API endpoints in a human-digestible form and includes the graph visualization.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

More Repositories

1

xv6-public

xv6 OS
C
7,277
star
2

xv6-riscv

Xv6 for RISC-V
C
5,888
star
3

biscuit

Biscuit research OS
Go
2,394
star
4

xv6-riscv-book

Text describing xv6 on RISC-V
TeX
538
star
5

RVirt

RISC-V hypervisor written in Rust
Rust
329
star
6

xv6-book

Commentary for xv6-public
Perl
235
star
7

fscq

FSCQ is a certified file system written and proven in Coq
Coq
231
star
8

xv6-riscv-fall19

6.S081/6.828 lab repo for fall 2019
C
191
star
9

perennial

Verifying concurrent crash-safe systems
Coq
129
star
10

6.828-qemu

qemu patched for debugging, used for 6.828
C
76
star
11

noria-ui

Web UI for Noria clusters
JavaScript
67
star
12

noria-mysql

MySQL/MariaDB protocol shim for Noria
Rust
66
star
13

go-journal

Verified, concurrent, crash-safe transaction system
Go
42
star
14

mcqc

A Gallina compiler with C++17 as an intermediate representation
Haskell
40
star
15

go-nfsd

Fast NFS server implemented using GoJournal
Go
34
star
16

ward

C++
29
star
17

daisy-nfsd

DaisyNFS is an NFS server verified using Dafny and Perennial.
Dafny
26
star
18

sigmaos

Go
23
star
19

scalefs

C
21
star
20

dsrg

Distributed Systems Reading Group
CSS
20
star
21

secfs-skeleton

Skeleton code for new 6.858 final project --- an encrypted and authenticated file system
Python
19
star
22

perflock

RWMutex for sharing of multicore machines.
C
15
star
23

6.826-2017-labs

Coq
13
star
24

vmvcc

Go
13
star
25

6.826-2020-labs

Lab assignments for 6.826
Coq
12
star
26

6.S060-labs

Programming labs for 6.S060 (Foundations of Computer Security).
Python
12
star
27

what

An improved version of `w`
Python
12
star
28

argosy

Proving crash safety for systems with layered recovery
Coq
11
star
29

cspec

Verifying concurrent code with layers and movers
Coq
11
star
30

6.826-2019-labs

Lab assignments for 6.826
Coq
10
star
31

syndicate

Syndicate multiplexes several distributed master-slave applications onto a single cluster of machines.
Go
9
star
32

gokv

Go
7
star
33

mailbot

Bot to send email notifications when pushing to GitHub
Shell
7
star
34

deepspec-pocs

Coq
6
star
35

6.1600-labs

Student lab assignments for MIT 6.1600
Python
6
star
36

spectrebench

C++
5
star
37

6.5660-lab-2023

Python
4
star
38

grove-artifact

Python
3
star
39

noria-benchmarks

Experiment scripts and results for Soup
Rust
3
star
40

6.1600-notes

TeX
3
star
41

perennial-examples

Examples verified using Perennial
Go
3
star
42

csail-events-slack

Simple Slack webhook for posting notifications about upcoming CSAIL seminars
Ruby
3
star
43

grove

Experiments in verifying distributed systems with Iris
2
star
44

new-students

Welcome for new PDOS students to get access to the organization
1
star