• Stars
    star
    118
  • Rank 299,923 (Top 6 %)
  • Language
    Rust
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An opinionated guide that tries to help you choose the best way to store contiguous data in Rust

A Guide to Contiguous Data in Rust

Many modern languages have collections called "array," "slice," or "vector." Rust has all three, plus many third-party libraries! This is an opinionated guide that tries to help you choose the best way to store contiguous data in Rust. It covers tools from Rust's core and standard library as well as third-party crates that meet more specific needs.

Contiguous data is when multiple pieces of data are stored next to each other in memory. It is often a great way to store collections, because it can provide great cache locality and branch prediction.

Tradeoffs

These are things to think about when choosing a technique for storing contiguous data.

Mutability

There are a few broad levels of mutability for contiguous data in Rust:

  • Completely immutable
  • Fixed length, mutable contents
  • Mutable length and contents

Allocation

Some solutions can be used with const and static, some solutions can use the memory of the stack, and some need to use memory allocated by an allocator (I'll call this "the heap" although I don't think that it technically has to be a heap).

Splitting

There are several ways that you can split contiguous data in Rust. Reviewing References and Borrowing from TRPL might be helpful to understanding this better. Since any contiguous data can be turned into a slice, the slice splitting rules generally apply for borrowed data. The bytes crate provides an interesting way to split owned data.

Extend vs. refuse

When you run out of memory in your contiguous data, should the data structure ask to allocate more memory or should it refuse to give you more memory? If you want to intentionally use a capped amount of memory, it might be better to refuse to extend the memory. This could be useful in embedded and/or real-time programming.

Types

Most of the solutions below can store any type but there are a couple exceptions.

Solutions

The solutions are roughly in order of increasing power, according to the Principle of Least Power. So, consider using the first solution that will solve your problem.

Fixed-size array: [T; N]

When you create a fixed-size array in Rust, you must specify the size (N) at compile time.

Given a mutable fixed-size array, you can change the content, but there is no way to change the size.

const MY_DATA: [i8; 3] = [1, 2, 3];

fn main() {
    let mut my_data = [2; 3];
    my_data[0] = 1;
    my_data[2] = 3;
    assert_eq!(MY_DATA, my_data);
}

Because the size is known at compile time, a fixed-size array can be stored anywhere, even to be used as a const. However, it might not be a good idea to store large fixed-size arrays on the stack because it could cause a stack overflow.

The length of the array is actually part of its type. This means that the following code, for example, won't compile because you can't compare two variables with different types:

const MY_DATA_3: [i8; 3] = [1, 2, 3];
const MY_DATA_4: [i8; 4] = [1, 2, 3, 4];

fn main() {
    assert_eq!(MY_DATA_3, MY_DATA_4);
}

An array cannot be split.

See also Array types from The Rust Reference.

Slice: &[T] or &mut [T]

In Rust, a slice is "a dynamically-sized view into a contiguous sequence." In contrast to an array, the length of a slice isn't known at compile time.

Given a mutable slice, you can change the content, but there is no way to change the length. When we take a mutable slice to an array, we're actually modifying the data in the array itself. That's why my_array needs to be declared as mut below.

fn main() {
    let mut my_array: [i8; 3] = [1, 2, 0];
    let my_slice: &mut [i8] = &mut my_array[1..];
    my_slice[1] = 3;
    assert_eq!(my_array, [1, 2, 3]);
}

Unlike with an an array, two slices of different lengths are the same type, so the code below works:

const MY_SLICE: &[i8] = &[1, 2, 3];

fn main() {
    let my_slice: &[i8] = &[1, 2, 3, 4];
    assert_ne!(MY_SLICE, my_slice);
}

A slice can refer to memory anywhere including as a const.

You can always create as many overlapping shared slices (&[T]) as you want, although you can't mutate them:

const MY_DATA: [i8; 3] = [2; 3];

fn main() {
    // Compare two overlapping slices
    assert_eq!(&MY_DATA[..2], &MY_DATA[1..]);
}

You can split a mutable slice into multiple non-overlapping mutable slices:

fn main() {
    let my_data = &mut [0, 2, 3, 0];
    let (left, right) = my_data.split_at_mut(2);
    left[0] = 1;
    right[1] = 4;
    assert_eq!(my_data, &[1, 2, 3, 4]);
}

See also TRPL on slices.

Boxed slice: Box<[T]>

With a boxed slice, you can create arrays at run time without knowing the length at compile time. In most cases, you could probably just use Vec instead. However, boxed slices do have a few use cases:

  • Writing a custom buffer (e.g. std::io::BufReader)
  • If you want to store data slightly more efficiently than a Vec (a boxed slice doesn't store a capacity)
  • If you generally want to prevent users from resizing the data

The code below won't compile; you can't collect an iterator into a fixed-size array because the number of elements in an iterator isn't generally known at compile time.

fn main() {
    let my_data: [i8; 3] = [1, 2, 3].iter().cloned().collect();
}

...but you can collect into a boxed slice:

const MY_DATA: [i8; 3] = [1, 2, 3];


fn main() {
    let my_data: Box<[i8]> = MY_DATA.iter().cloned().collect();

    // We need to take a slice b/c we can't compare boxed slices and arrays directly
    assert_eq!(&my_data[..], MY_DATA);
}

Because the length isn't known at compile time, the data for a boxed slice can only be allocated with an allocator.

Vec

std::vec::Vec is the std way to do variable-length contiguous data. When you try to add data and there's not enough room, the data structure will allocate more memory for the data.

fn main() {
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
    vec.push(3);
    
    assert_eq!(vec.len(), 3);
    assert_eq!(vec, [1, 2, 3]);
}

Because the length is dynamic and not known at compile time, the data for a Vec can only live on the heap.

We can split a Vec into two separate Vecs using the split_off method. However, doing this (surprisingly?) copies the data. According to user hanna-kruppe on the Rustlang forum,:

But as far as I know, there’s currently no way to tell the allocator β€œHey I got this piece of memory from you, now please pretend you gave it to me as two separate, contiguous allocations”

See also [Storing Lists of Values with Vectors](Storing Lists of Values with Vectors) from The Rust Programming Language.

smallvec

The smallvec provides an interface that is very close to Vec, but it will store small numbers of items on the stack instead of allocating on the heap. This is good if you suspect your data structure will normally be quite small but it may need to grow occasionally.

Whereas an array with type [T; 4] stores exactly 4 elements, a SmallVec of type SmallVec[T; 4] can store more than 4 elements. If it has 4 or fewer elements, it will be stored on the stack; otherwise, it will be stored on the heap.

use smallvec::{SmallVec, smallvec};
    
fn main() {
    // This SmallVec can hold up to 4 items on the stack:
    let mut v: SmallVec<[i32; 4]> = smallvec![1, 2, 0, 4];
    
    // It will automatically move its contents to the heap if
    // contains more than four items:
    v.push(5);
    
    // SmallVec points to a slice, so you can use normal slice
    // indexing and other methods to access its contents:
    v[2] = 3;
    assert_eq!(&[1, 2, 3, 4, 5], &v[..]);
}

See also When is it β€œmorally” correct to use smallvec? from The Rust Programming Language Forum.

arrayvec

This crate will let you store a Vec inside of an array, but it won't let you exceed the length of the underlying array. This means that the data can live in the data segment, on the stack, or on the heap.

arrayvec

use arrayvec::ArrayVec;

fn main() {
    let mut array = ArrayVec::<[_; 2]>::new();
    assert_eq!(array.try_push(1), Ok(()));
    assert_eq!(array.try_push(2), Ok(()));
    assert!(array.try_push(3).is_err());
    assert_eq!(&array[..], &[1, 2]);
}

tinyvec

tinyvec provides 100%-safe alternatives to both arrayvec and smallvec. It works pretty much the same way except that the types must implement Default. Note that it doesn't provide the exact same APIs.

use tinyvec::ArrayVec;

fn main() {
    let mut array = ArrayVec::<[_; 2]>::new();
    array.push(1);
    array.push(2);
}
use tinyvec::ArrayVec;

fn main() {
    let mut array = ArrayVec::<[_; 2]>::new();
    array.push(1);
    array.push(2);
    array.push(3);
}

VecDeque

std::collections::VecDeque is "a double-ended queue implemented with a growable ring buffer." It's pretty similar to a Vec except that it can efficiently push and pop from both front and back. This means that VecDeque can be used as a queue or as a double-ended queue. See std::collections docs for more details.

use std::collections::VecDeque;

fn main() {
    let mut vec_deque = VecDeque::with_capacity(3);
    vec_deque.push_back(0);
    vec_deque.push_back(2);
    vec_deque.push_back(3);
    assert_eq!(Some(0), vec_deque.pop_front());
    vec_deque.push_front(1);

    assert_eq!(&vec_deque, &[1, 2, 3]);
}

Like a Vec, the VecDeque data lives on the heap.

The bytes crate

bytes provides Bytes, "an efficient container for storing and operating on contiguous slices of memory." One of its signature features is that, unlike Vec, it allows you to split ownership of data without copying. Unlike the other tools in this guide, the bytes crate can't store types T, it can only store u8.

use bytes::{BytesMut, BufMut};

fn main() {
    let mut whole = BytesMut::with_capacity(1024);
    whole.put(&[1u8, 2, 3, 4] as &[u8]);
    
    let mut right = whole.split_off(2);
    let mut left = whole;
    
    std::thread::spawn(move || {
        right[1] = 6;
        assert_eq!(&right[..], [3, 6]);
    });
    
    left[0] = 5;
    assert_eq!(&left[..], [5, 2]);
}

The data for the bytes data structures will live on the heap.

Honorable Mention

These are a few techniques that are good to know about, but the use cases are pretty narrow or there are major drawbacks.

  • A boxed array (Box<[T; N]>) is not the same as a boxed slice (Box<[T]>). You probably don't want to use a boxed array; among other things, there are a couple major ergonomics drawbacks at the time of writing. See also stack overflow.
  • Slice Deque is a double-ended queue that uses operating system virtual memory trickery to let you see the entire contents of the queue as a slice.
  • tendril is a crate from the Servo org boasting really advanced features for zero-copy parsing.
  • ndarray and nalgebra for math-heavy and multidimensional arrays.

More Resources

Thanks

Thanks to soruh_c10 on Twitch and s3bk and mejrs on users.rust-lang.org for suggestions and corrections!

More Repositories

1

global-data-in-rust

This guide explains how you can use "global data" in Rust
Rust
198
star
2

glx

Analyzing the Green Line Extension with OpenStreetMap
Rust
195
star
3

hmmm

Hidden Markov Models in Rust
Rust
66
star
4

rustdoc-katex-demo

This crate demonstrates an approach to including KaTeX in Rust docs
Rust
27
star
5

bitcoin-script-explorer

A tool to help programmers learn Bitcoin Script.
CSS
27
star
6

bfgs

A pure Rust implementation of the BFGS optimization algorithm
Rust
18
star
7

ndarray-csv

Easily read homogeneous CSV data into a 2D ndarray
Rust
17
star
8

future-by-example

Examples of common patterns using Future
Rust
15
star
9

bitcoin-nanopayment

A probabilistic Bitcoin nanopayments library in Node using the Bitcoin-Qt RPC API
JavaScript
14
star
10

exandria

A decentralized file sharing system that includes search
JavaScript
6
star
11

burnie

An SPV Bitcoin burn verification library built on webcoin
JavaScript
6
star
12

iced-plot

Rust
5
star
13

spin_on

Rust
5
star
14

tokio-fmt-encoder

A Tokio Encoder for Debug and Display
Rust
4
star
15

ssaas

Screen Saver as a Service
3
star
16

exploring-computation-graphs-in-rust

Rust
3
star
17

protozor

A P2P system for replicating streaming signed append-only logs
JavaScript
2
star
18

frunk-column

A prototype of columnar data storage using frunk HLists
Rust
2
star
19

burn-stream

A world-writable append-only log using burned bitcoins
JavaScript
2
star
20

swarmhub

swarmhub makes a webrtc-swarm act like a signalhub.
JavaScript
2
star
21

decentralized-cat-names

A decentralized set of cat names
JavaScript
2
star
22

pystan-cache

A PyStan wrapper that caches compiled models
Python
2
star
23

tokio-stdin

Read from stdin as a Tokio stream
Rust
1
star
24

fake-words

Python
1
star
25

rss_catalog

A collection of RSS feeds of articles organized in a machine-readable way.
Python
1
star
26

burn-name

Simple decentralized username registration by burning bitcoins
JavaScript
1
star
27

compass

Rust
1
star
28

webcoin-checkpoint-maker

Make checkpoints for your webcoin projects
JavaScript
1
star
29

error-test-sinks

Sink implementations for testing that simulate different types of errors
Rust
1
star
30

baby-bayes

Rust
1
star
31

wasm-math-experiment

Rust
1
star
32

cache-live-stream

Cache append-only live streams using levelup
JavaScript
1
star
33

burn-name-writer

Register usernames with BurnName
JavaScript
1
star
34

logistic-regression-in-rust

A toy implementation of logistic regression in Rust
Rust
1
star
35

tokio-stdout

Write to stdout as a Tokio sink
Rust
1
star