• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    Rust
  • License
    MIT License
  • Created over 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Various implementation strategies for “DOM-like” tree data structures in Rust.

No Maintenance Intended

No Maintenance Intended

This repository exists to publish the results of some experiments that I did over a couple week-ends that one time. This is not a software project that is being maintained over time.

If you would like to use this code, consider copying into your own source repository under the terms of the MIT license.

rust-forest

Various implementation strategies for “DOM-like” tree data structures in Rust.

“DOM-like” here means that data structures can be used to represent the parsed content of an HTML or XML document, like the DOM does, but don’t necessarily have the exact same API as the DOM. That is:

  • A tree is made up of nodes.
  • Each node has zero or more child nodes, which are ordered.
  • Each node has a no more than one parent, the node that it is a child of.
  • A node without a parent is called a root.
  • As a consequence, each node may also have siblings: its parent’s other children, if any.
  • From any given node, access to its parent, previous sibling, next sibling, first child, and last child (if any) can take no more than O(1) time.
  • Each node also has data associated to it, which for the purpose of this project is purely generic. For an HTML document, the data would be either the text of a text node, or the name and attributes of an element node.
  • The tree is mutable: nodes (with their sub-trees) can be inserted or removed anywhere in the tree.

In particular, the need to access parents means we can not use the obvious structure of child nodes being owned directly by their parent:

struct Node<T> {
    data: T,
    children: Vec<Node<T>>,
    parent: /* ??? */
}

More generally, a tree with parent and sibling relationships (in addition to children relationships) can be viewed as a graph of a special kind, but still a graph that contains cycles. And Rust’s default ownership model does not support cycles easily. Therefore, we need a more involved strategy to manage the lifetime of nodes.

In Servo, DOM nodes are managed by the JavaScript garbage collector. Rust however currently does not have a tracing garbage collector of its own, and many Rust projects might want to manipulate DOM-like trees without embedding an entire JavaScript virtual machine.

This repository contains multiple crates that each implement a DOM-like data structure with a different approach and different trade-offs.

rctree

The lifetime of nodes is managed through reference counting. To avoid reference cycles which would cause memory leaks, the tree is asymmetric: each node holds optional strong references to its next sibling and first child, but only optional weak references to its parent, previous sibling, and last child.

Nodes are destroyed as soon as there is no strong reference left to them. The structure is such that holding a reference to the root is sufficient to keep the entire tree alive. However, if for example the only reference that exists from outside the tree is one that you use to traverse it, you will not be able to go back “up” the tree to ancestors and previous siblings after going “down”, as those nodes will have been destroyed.

Weak references to destroyed nodes are treated as if they were not set at all. (E.g. a node can become a root when its parent is destroyed.)

Since nodes are aliased (have multiple references to them), RefCell is used for interior mutability.

Advantages:

  • A single NodeRef user-visible type to manipulate the tree, with methods
  • Memory is freed as soon as it becomes unused (if parts of the tree are removed).

Disadvantages:

  • The tree can only be accessed from the thread is was created in.
  • Any tree manipulation, including read-only traversals, requires incrementing and decrementing reference counts, which causes run-time overhead
  • Nodes are allocated individually, which may cause memory fragmentation and hurt performance.

arena-tree

The lifetime of nodes is managed through an arena allocator.

Nodes are tied, through &'a T references, to the lifetime of the arena and are destroyed all at once when the arena is destroyed. The links between nodes are also &'a T references internally.

Since nodes are aliased (have multiple references to them), Cell is used for interior mutability.

Advantages:

  • Nodes still have methods, the arena can largely be ignored during tree manipulation (as long as it’s kept alive).
  • Less runtime overhead. (Allocation is fast, no reference count to increment or decrement.)
  • Nodes are allocated in large, continuous chunks of memory. This helps memory cache performance.

Disadvantages:

  • There is one more object (the arena) to keep around.
  • The tree can only be accessed from the thread is was created in.
  • Memory “leaks” when nodes are removed from the tree before the arena is destroyed. This can be alleviated by maintaining a free list of nodes that are not used anymore and can be reused, but only if these nodes have been freed explicitly. (Whereas reference counting and garbage collection would free unused nodes automatically.) But then nothing prevents references to freed nodes to be kept around. Another explicit mechanism (like tracking a generation number in each node) can be added to prevent (mostly) such references and accessing a freed or re-allocated node.

idtree

Similar to arena-tree, but the arena is simplified to a single Vec and numerical identifiers (indices in the vector) are used instead of &'a T references.

Advantages:

  • There is no RefCell, mutability is handled in a way much more idiomatic to Rust through unique (&mut) access to the arena.
  • The tree can be sent or shared across threads like a Vec. This enables e.g. parallel tree traversals.

Disadvantages:

  • Every access requires going through the arena, which can be cumbersome.
  • There is some runtime overhead over arena-tree because of bound checks.
  • A node ID from a given arena can be used in a different arena, which is likely to not cause an error and access an unrelated node.

More Repositories

1

rust-on-bbc-microbit

Running Rust code on a BBC micro:bit micro-controller
Rust
193
star
2

wtf-8

The WTF-8 encoding specification
HTML
96
star
3

rust-std-candidates

Candidates for inclusion in the Rust standard library
Rust
87
star
4

teensy-clock

A digital clock based on Teensy 3.2 and Rust
Rust
57
star
5

html5ever-python

Python bindings for html5ever, using CFFI
Python
43
star
6

victor

Victor makes vectors.
Rust
31
star
7

snippets

Pieces of code that are not big enough to be worth their own repository
Python
29
star
8

rust-utf8

Incremental, zero-copy UTF-8 decoding for Rust
Rust
22
star
9

rust-wtf8

Rust implementation of the WTF-8 encoding.
Rust
17
star
10

rust-pdf

Generating PDF files in pure Rust
Rust
15
star
11

hello-pyrust

A “Hello World” of calling Rust code from a Python program with CFFI, in order to show packaging issues
Python
11
star
12

exyr.org

Source code for my personal website
HTML
10
star
13

rust-movecell

`std::cell::Cell` for not-implicitly-copyable types.
Rust
8
star
14

gregor

Simple implementation of the Gregorian calendar for Rust
Rust
6
star
15

rust-webencodings

WIP rust implementation of the WHATWG Encoding Standard
Rust
4
star
16

pycairo

Mirror of http://cgit.freedesktop.org/pycairo/ , with patches
C
3
star
17

rust-rc

A copy of std::rc that runs on stable Rust with weak references
Rust
2
star
18

gedit-trailing-spaces

A simple gedit plugin that highlights trailing spaces in files and automatically clears the ones left by auto-indentation.
Python
2
star
19

data-urls

See instead:
HTML
2
star
20

azureblur

The triple box blur implementation from Firefox’s moz2d/Azure, with Python bindings.
C++
2
star
21

GitAtomizer

Build Atom feeds for git commits, with full diffs
Python
2
star
22

cairo-staticlib

The cairo graphics library, statically linked with minimal dependencies, for Rust crates
C
2
star
23

servo-style

Prototype replacement style system for Servo
CSS
1
star
24

css

Various proposals for the CSS specifications
Shell
1
star
25

riscv-qemu-demo

Rust
1
star
26

actions-playground

Rust
1
star
27

bascule

Logic circuit simulation where everything is NAND gates and D flip-flops
Rust
1
star
28

lselect

CSS3 Selectors for lxml
1
star
29

xml5ever

Extracting https://github.com/servo/html5ever/pull/125 into an independent library
Rust
1
star
30

run-nightly

Run stuff every night, to test on new Rust nightly builds.
Shell
1
star