• Stars
    star
    934
  • Rank 48,927 (Top 1.0 %)
  • Language
    Rust
  • Created almost 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

pure rust io_uring library, built on libc, thread & async friendly, misuse resistant

rio

bindings for io_uring, the hottest thing to happen to linux IO in a long time.

Soundness status

rio aims to leverage Rust's compile-time checks to be misuse-resistant compared to io_uring interfaces in other languages, but users should beware that use-after-free bugs are still possible without unsafe when using rio. Completion borrows the buffers involved in a request, and its destructor blocks in order to delay the freeing of those buffers until the corresponding request has completed; but it is considered safe in Rust for an object's lifetime and borrows to end without its destructor running, and this can happen in various ways, including through std::mem::forget. Be careful not to let completions leak in this way, and if Rust's soundness guarantees are important to you, you may want to avoid this crate.

Innovations

  • only relies on libc, no need for c/bindgen to complicate things, nobody wants that
  • the completions work great with threads or an async runtime (Completion implements Future)
  • uses Rust marker traits to guarantee that a buffer will never be written into unless it is writable memory. (prevents you from trying to write data into static read-only memory)
  • no need to mess with IoSlice / libc::iovec directly. rio maintains these in the background for you.
  • If left to its own devices, io_uring will allow you to submit more IO operations than would actually fit in the completion queue, allowing completions to be dropped and causing leaks of any userspace thing waiting for the completion. rio exerts backpressure on submitters when the number of in-flight requests reaches this threshold, to guarantee that no completions will be dropped due to completion queue overflow.
  • rio will handle submission queue submissions automatically. If you start waiting for a Completion, rio will make sure that we have already submitted at least this request to the kernel. Other io_uring libraries force you to handle this manually, which is another possible source of misuse.

This is intended to be the core of sled's writepath. It is built with a specific high-level application in mind: a high performance storage engine and replication system.

What's io_uring?

io_uring is the biggest thing to happen to the linux kernel in a very long time. It will change everything. Anything that uses epoll right now will be rewritten to use io_uring if it wants to stay relevant. It started as a way to do real async disk IO without needing to use O_DIRECT, but its scope has expanded and it will continue to support more and more kernel functionality over time due to its ability to batch large numbers different syscalls. In kernel 5.5 support is added for more networking operations like accept(2), sendmsg(2), and recvmsg(2). In 5.6 support is being added for recv(2) and send(2). io_uring has been measured to dramatically outperform epoll-based networking, with io_uring outperforming epoll-based setups more and more under heavier load. I started rio to gain an early deep understanding of this amazing new interface, so that I could use it ASAP and responsibly with sled.

io_uring unlocks the following kernel features:

  • fully-async interface for a growing number of syscalls
  • async disk IO without using O_DIRECT as you have to do with AIO
  • batching hundreds of disk and network IO operations into a single syscall, which is especially wonderful in a post-meltdown/spectre world where our syscalls have dramatically slowed down
  • 0-syscall IO operation submission, if configured in SQPOLL mode
  • configurable completion polling for trading CPU for low latency
  • Allows expression of sophisticated 0-copy broadcast semantics, similar to splice(2) or sendfile(2) but working with many file-like objects without ever needing to bounce memory and mappings into userspace en-route.
  • Allows IO buffers and file descriptors to be registered for cheap reuse (remapping buffers and file descriptors for use in the kernel has a significant cost).

To read more about io_uring, check out:

For some slides with interesting io_uring performance results, check out slides 43-53 of this presentation deck by Jens.

why not use those other Rust io_uring libraries?

  • they haven't copied rio's features yet, which you pretty much have to use anyway to responsibly use io_uring due to the sharp edges of the API. All of the libraries I've seen as of January 13 2020 are totally easy to overflow the completion queue with, as well as easy to express use-after-frees with, don't seem to be async-friendly, etc...

examples that will be broken in the next day or two

async tcp echo server:

use std::{
    io::self,
    net::{TcpListener, TcpStream},
};

async fn proxy(ring: &rio::Rio, a: &TcpStream, b: &TcpStream) -> io::Result<()> {
    let buf = vec![0_u8; 512];
    loop {
        let read_bytes = ring.read_at(a, &buf, 0).await?;
        let buf = &buf[..read_bytes];
        ring.write_at(b, &buf, 0).await?;
    }
}

fn main() -> io::Result<()> {
    let ring = rio::new()?;
    let acceptor = TcpListener::bind("127.0.0.1:6666")?;

    extreme::run(async {
        // kernel 5.5 and later support TCP accept
        loop {
            let stream = ring.accept(&acceptor).await?;
            dbg!(proxy(&ring, &stream, &stream).await);
        }
    })
}

file reading:

let ring = rio::new().expect("create uring");
let file = std::fs::open("file").expect("openat");
let data: &mut [u8] = &mut [0; 66];
let completion = ring.read_at(&file, &mut data, at);

// if using threads
completion.wait()?;

// if using async
completion.await?

file writing:

let ring = rio::new().expect("create uring");
let file = std::fs::create("file").expect("openat");
let to_write: &[u8] = &[6; 66];
let completion = ring.write_at(&file, &to_write, at);

// if using threads
completion.wait()?;

// if using async
completion.await?

speedy O_DIRECT shi0t (try this at home / run the o_direct example)

use std::{
    fs::OpenOptions, io::Result,
    os::unix::fs::OpenOptionsExt,
};

const CHUNK_SIZE: u64 = 4096 * 256;

// `O_DIRECT` requires all reads and writes
// to be aligned to the block device's block
// size. 4096 might not be the best, or even
// a valid one, for yours!
#[repr(align(4096))]
struct Aligned([u8; CHUNK_SIZE as usize]);

fn main() -> Result<()> {
    // start the ring
    let ring = rio::new()?;

    // open output file, with `O_DIRECT` set
    let file = OpenOptions::new()
        .read(true)
        .write(true)
        .create(true)
        .truncate(true)
        .custom_flags(libc::O_DIRECT)
        .open("file")?;

    let out_buf = Aligned([42; CHUNK_SIZE as usize]);
    let out_slice: &[u8] = &out_buf.0;

    let in_buf = Aligned([42; CHUNK_SIZE as usize]);
    let in_slice: &[u8] = &in_buf.0;

    let mut completions = vec![];

    for i in 0..(10 * 1024) {
        let at = i * CHUNK_SIZE;

        // By setting the `Link` order,
        // we specify that the following
        // read should happen after this
        // write.
        let write = ring.write_at_ordered(
            &file,
            &out_slice,
            at,
            rio::Ordering::Link,
        );
        completions.push(write);

        let read = ring.read_at(&file, &in_slice, at);
        completions.push(read);
    }

    for completion in completions.into_iter() {
        completion.wait()?;
    }

    Ok(())
}

More Repositories

1

sled

the champagne of beta embedded databases
Rust
8,111
star
2

tla-rust

writing correct lock-free and distributed stateful systems in Rust, assisted by TLA+
TLA
1,040
star
3

extreme

extremely boring async function runner!
Rust
151
star
4

paxos

simple CASPaxos implementation written in rust on top of a simulator for finding bugs quickly
Rust
139
star
5

loghisto

counters and logarithmically bucketed histograms for distributed systems
Go
82
star
6

rasputin

(getting to be a) hard to kill scalable linearizabe store
Rust
76
star
7

seaslug

scraps of a potential language
Rust
36
star
8

rsdb

rust database engineering toolkit
Rust
33
star
9

model

model testing sugar for testing interactions on structures over time
Rust
27
star
10

crack

building blocks and testing tools for reliable systems
Rust
27
star
11

historian

Rust
22
star
12

mesos-rs

Mesos bindings using the v1 HTTP API
Rust
17
star
13

paranoia

Rust
16
star
14

assert_panic_free

Rust
15
star
15

tx

software transactional memory in rust
Rust
13
star
16

sludge

speedy web micro-framework using sled, io_uring and SIMD
Rust
12
star
17

acid-state

rust transactional state library
Rust
11
star
18

quickcheck-tut

Rust
10
star
19

state-guide

overviews of consistency, availability and durability
8
star
20

slides

7
star
21

icefall

eventually consistent location-agnostic serverless and mobile database
7
star
22

deterministic

utilities for imposing determinism on random number generators, thread execution, etc...
Rust
7
star
23

triple

embeddable/standalone rust triple store
Rust
6
star
24

test-idioms

Rust
5
star
25

lathe

performant self-healing distributed micro-framework in rust
4
star
26

dots

Haskell
4
star
27

hardware-effects-rs

A Rust port of Kobzol/hardware-effects demonstrating performance issues
4
star
28

inlinable-box

auto-box things that fit in a usize
Rust
4
star
29

lfsr

simple linear feedback shift registers for test case generation
Rust
4
star
30

programming-in-haskell

exercises, experiments, examples
Haskell
4
star
31

llrb-rs

left-leaning red-black tree
Rust
4
star
32

idgen.go

very fast monotonic ID generator
3
star
33

dist-sys

well-tested flexible distributed systems building blocks
3
star
34

eurorack

patches for the droid eurorack modular sequencer
3
star
35

rust-reactive-log

performant transactional composable persistence
Rust
3
star
36

zk-glove

command line tools for distributed orchestration and discovery
Go
3
star
37

hemlock

transactional distributed store with SSI on mesos
C++
2
star
38

rust-srv

simple portable libresolv-backed SRV record resolution
Rust
2
star
39

rust-futurepool

simple future pool library for predictable concurrency
Rust
2
star
40

tikv-client

TiKV client powered by Tokio
Rust
2
star
41

belasitsa

a tiny green WSGI back end for Mongrel 2
Python
2
star
42

ak47

extremely reliable networked systems toolkit
2
star
43

quark

distributed liveness telemetry
Erlang
2
star
44

determine-disk-block-size

2
star
45

sitecache

transparent distributed caching proxy for distribution of static files in a datacenter
Go
2
star
46

googleapis-rs

mechanically generated rust gRPC bindings for the google APIs
Rust
2
star
47

quick

property testing with shrinking, drop-in compatible with testing/quick
Go
2
star
48

cyborg-training-camp

C
1
star
49

faultcheck

quickcheck + black box fault injection
1
star
50

disruptors

dirsuptor patterns in different languages
1
star
51

id-gen

Distributed system simulation: ID generator
Rust
1
star
52

cavemon

external process instrumentation from the terminal
1
star