• Stars
    star
    3,334
  • Rank 12,952 (Top 0.3 %)
  • Language
    Rust
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 26 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Write expressive, high-performance parsers with ease.

Chumsky

crates.io crates.io License actions-badge

A parser library for humans with powerful error recovery.

Example usage with my own language, Tao

Note: Error diagnostic rendering is performed by Ariadne

Contents

Features

  • Lots of combinators!
  • Generic across input, output, error, and span types
  • Powerful error recovery strategies
  • Inline mapping to your AST
  • Text-specific parsers for both u8s and chars
  • Recursive parsers
  • Backtracking is fully supported, allowing the parsing of all known context-free grammars
  • Parsing of nesting inputs, allowing you to move delimiter parsing to the lexical stage (as Rust does!)
  • Built-in parser debugging

Example Brainfuck Parser

See examples/brainfuck.rs for the full interpreter (cargo run --example brainfuck -- examples/sample.bf).

use chumsky::prelude::*;

#[derive(Clone)]
enum Instr {
    Left, Right,
    Incr, Decr,
    Read, Write,
    Loop(Vec<Self>),
}

fn parser<'a>() -> impl Parser<'a, &'a str, Vec<Instr>> {
    recursive(|bf| choice((
        just('<').to(Instr::Left),
        just('>').to(Instr::Right),
        just('+').to(Instr::Incr),
        just('-').to(Instr::Decr),
        just(',').to(Instr::Read),
        just('.').to(Instr::Write),
        bf.delimited_by(just('['), just(']')).map(Instr::Loop),
    ))
        .repeated()
        .collect())
}

Other examples include:

Tutorial

Chumsky has a tutorial that teaches you how to write a parser and interpreter for a simple dynamic language with unary and binary operators, operator precedence, functions, let declarations, and calls.

What is a parser combinator?

Parser combinators are a technique for implementing parsers by defining them in terms of other parsers. The resulting parsers use a recursive descent strategy to transform a stream of tokens into an output. Using parser combinators to define parsers is roughly analogous to using Rust's Iterator trait to define iterative algorithms: the type-driven API of Iterator makes it more difficult to make mistakes and easier to encode complicated iteration logic than if one were to write the same code by hand. The same is true of parser combinators.

Why use parser combinators?

Writing parsers with good error recovery is conceptually difficult and time-consuming. It requires understanding the intricacies of the recursive descent algorithm, and then implementing recovery strategies on top of it. If you're developing a programming language, you'll almost certainly change your mind about syntax in the process, leading to some slow and painful parser refactoring. Parser combinators solve both problems by providing an ergonomic API that allows for rapidly iterating upon a syntax.

Parser combinators are also a great fit for domain-specific languages for which an existing parser does not exist. Writing a reliable, fault-tolerant parser for such situations can go from being a multi-day task to a half-hour task with the help of a decent parser combinator library.

Classification

Chumsky's parsers are recursive descent parsers and are capable of parsing parsing expression grammars (PEGs), which includes all known context-free languages. It is theoretically possible to extend Chumsky further to accept limited context-sensitive grammars too, although this is rarely required.

Error Recovery

Chumsky has support for error recovery, meaning that it can encounter a syntax error, report the error, and then attempt to recover itself into a state in which it can continue parsing so that multiple errors can be produced at once and a partial AST can still be generated from the input for future compilation stages to consume.

However, there is no silver bullet strategy for error recovery. By definition, if the input to a parser is invalid then the parser can only make educated guesses as to the meaning of the input. Different recovery strategies will work better for different languages, and for different patterns within those languages.

Chumsky provides a variety of recovery strategies (each implementing the Strategy trait), but it's important to understand that all of

  • which you apply
  • where you apply them
  • what order you apply them

will greatly affect the quality of the errors that Chumsky is able to produce, along with the extent to which it is able to recover a useful AST. Where possible, you should attempt more 'specific' recovery strategies first rather than those that mindlessly skip large swathes of the input.

It is recommended that you experiment with applying different strategies in different situations and at different levels of the parser to find a configuration that you are happy with. If none of the provided error recovery strategies cover the specific pattern you wish to catch, you can even create your own by digging into Chumsky's internals and implementing your own strategies! If you come up with a useful strategy, feel free to open a PR against the main repository!

Performance

Chumsky focuses on high-quality errors and ergonomics over performance. That said, it's important that Chumsky can keep up with the rest of your compiler! Unfortunately, it's extremely difficult to come up with sensible benchmarks given that exactly how Chumsky performs depends entirely on what you are parsing, how you structure your parser, which patterns the parser attempts to match first, how complex your error type is, what is involved in constructing your AST, etc. All that said, here are some numbers from the JSON benchmark included in the repository running on my Ryzen 7 3700x.

test chumsky ... bench:   4,782,390 ns/iter (+/- 997,208)
test pom     ... bench:  12,793,490 ns/iter (+/- 1,954,583)

I've included results from pom, another parser combinator crate with a similar design, as a point of reference. The sample file being parsed is broadly representative of typical JSON data and has 3,018 lines. This translates to a little over 630,000 lines of JSON per second.

Clearly, this is a little slower than a well-optimised hand-written parser: but that's okay! Chumsky's goal is to be fast enough. If you've written enough code in your language that parsing performance even starts to be a problem, you've already committed enough time and resources to your language that hand-writing a parser is the best choice going!

Planned Features

  • An optimised 'happy path' parser mode that skips error recovery & error generation
  • An even faster 'validation' parser mode, guaranteed to not allocate, that doesn't generate outputs but just verifies the validity of an input

Philosophy

Chumsky should:

  • Be easy to use, even if you don't understand exactly what the parser is doing under the hood
  • Be type-driven, pushing users away from anti-patterns at compile-time
  • Be a mature, 'batteries-included' solution for context-free parsing by default. If you need to implement either Parser or Strategy by hand, that's a problem that needs fixing
  • Be 'fast enough', but no faster (i.e: when there is a tradeoff between error quality and performance, Chumsky will always take the former option)
  • Be modular and extensible, allowing users to implement their own parsers, recovery strategies, error types, spans, and be generic over both input tokens and the output AST

Notes

My apologies to Noam for choosing such an absurd name.

License

Chumsky is licensed under the MIT license (see LICENSE in the main repository).

More Repositories

1

flume

A safe and fast multi-producer, multi-consumer channel.
Rust
1,820
star
2

ariadne

A fancy diagnostics & error reporting crate
Rust
1,309
star
3

tao

A statically-typed functional language with generics, typeclasses, sum types, pattern-matching, first-class functions, currying, algebraic effects, associated types, good diagnostics, etc.
Rust
923
star
4

pollster

A minimal async executor that lets you block on a future
Rust
288
star
5

broom

An ergonomic tracing garbage collector that supports mark 'n sweep garbage collection
Rust
243
star
6

euc

A software rendering crate that lets you write shaders with Rust
Rust
241
star
7

forge

A lightweight, elegant scripting language with built-in Rust-FFI.
Rust
161
star
8

atto

An insanely simple self-hosted functional programming language
Rust
141
star
9

parze

A clean, efficient parser combinator
Rust
123
star
10

teloren

A command-line frontend for Veloren
Rust
89
star
11

funkicrab

Optimising Brainfuck compiler: Run your beloved Brainfuck code, but faster.
Rust
64
star
12

openmw-volumetric-clouds

A volumetric clouds mod for OpenMW
64
star
13

openmw-shaders

Photorealistic shaders for Morrowind
GLSL
59
star
14

vulcan

A minimalistic text editor designed for both ordinary use and software development
Vala
45
star
15

lagoon

A thread pool crate with an array of features
Rust
36
star
16

mutation

Unleash the power of nightly Rust to write code that's generic over mutation!
Rust
23
star
17

coord-rs

[deprecated] A simple, ergonomic vector mathematics crate for Rust
Rust
22
star
18

errant

A (mostly) drop-in replacement for Rust's Result that provides backtrace support.
Rust
22
star
19

zte

Zesterer's Text Editor
Rust
18
star
20

leon

A lightweight scripting language for Rust
Rust
18
star
21

tupai

A modular POSIX-like operating system created for educational purposes
C++
16
star
22

gui

An experimental stateful, structured, declarative GUI crate
Rust
12
star
23

the-bitwise-challenge

Challenge: Can you develop a game with only 8 bytes of state?
9
star
24

vast-outdated

As The Name Suggests: A Pretty Large Space Sim
C++
8
star
25

vm-perf

Performance comparisons between various virtual interpreter implementation strategies
Rust
8
star
26

gba-test

Software rasterisation on the GBA in Rust. Some experiments from a while ago.
Rust
7
star
27

thoth

A modular, x86_64 micro-kernel operating system
C
6
star
28

alonzo

A pure Rust functional compiler backend
Rust
6
star
29

que

An experimental lock-free queue
Rust
6
star
30

cargo-veloren

Name-squatting, for the moment
Rust
5
star
31

babble

A (horrendously hackish) clean room reimplementation of the Library of Babel, originally at https://libraryofbabel.info (seriously, check it out)
Python
5
star
32

synco

An experimental ECS crate that makes use of GATs
Rust
5
star
33

fula

A functional programming language with Hindley-Milner type inference
Rust
5
star
34

fuckvm

A highly experimental Brainfuck-targetting LLVM-like compiler backend
Rust
4
star
35

smash

Yet another blazingly fast hashmap written in Rust
Rust
4
star
36

emul8

Yet another CHIP-8 emulator written in Rust
Rust
4
star
37

bitwise-examples

Example games that persist just 8 bytes of state between frames
Rust
4
star
38

browser

Vala
4
star
39

nilts-old

A game about many things. I don't know what, since most content is randomly generated.
C
3
star
40

oms

[WIP] Orbital mechanics simulation tool/library
Rust
3
star
41

hire-me

Hire me! Here's why...
3
star
42

turk

A generic minmax algorithm that may be used as an AI player for 2-player games
Rust
3
star
43

libvolume

A voxel engine library used primarily in my other projects
C++
3
star
44

wavefront

A Wavefront OBJ parser and utility crate
Rust
3
star
45

voxeltest

A test voxel engine program
C++
3
star
46

emul8or

A CHIP-8 emulator written using Vala and SDL
Vala
3
star
47

sdf-test

An experiment in Signed Distance Field (SDF) raytracing and raymarching
C++
3
star
48

ir

An experimental language intermediate representation
Rust
2
star
49

yurt

A highly experiment portable runtime
Rust
2
star
50

parze-new

Rust
2
star
51

nilts

Work in progress which will hopefully one day be good
C++
2
star
52

picos

Raspberry Pi Card Operating System
C
2
star
53

async-priority-queue

An async-aware priority queue
Rust
2
star
54

spurv

A free, open-source CPU and instruction set specification with a minimalist design
2
star
55

jsbarretto

A personal website
HTML
2
star
56

opplyse

A clean, efficient GTK3 text editor for programmers. The big brother of Vulcan.
Vala
2
star
57

nilts-oldish

The procedurally generated game
C++
2
star
58

super-block

A platforming game written for the CHIP-8
1
star
59

forge-demo

Run Forge code online!
JavaScript
1
star
60

cragmoor

A text-based ASCII RPG procedurally generated game inspired by Dwarf Fortress
C++
1
star
61

snakes-on-a-continuous-plane

A 2D Continuous Snakes Game Created For Ludum Dare 34
CMake
1
star
62

timber

I got bored one afternoon and started writing a desktop panel
Vala
1
star
63

vast-old

Vast is a space sim written in C++ using OpenGL
C++
1
star
64

pokerom

A GameBoy Color (GBC) emulator written in C++ with SDL 2
CMake
1
star