• Stars
    star
    517
  • Rank 82,533 (Top 2 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A minimal `syn` syntax tree pretty-printer

prettyplease::unparse

github crates.io docs.rs build status

A minimal syn syntax tree pretty-printer.


Overview

This is a pretty-printer to turn a syn syntax tree into a String of well-formatted source code. In contrast to rustfmt, this library is intended to be suitable for arbitrary generated code.

Rustfmt prioritizes high-quality output that is impeccable enough that you'd be comfortable spending your career staring at its output โ€” but that means some heavyweight algorithms, and it has a tendency to bail out on code that is hard to format (for example rustfmt#3697, and there are dozens more issues like it). That's not necessarily a big deal for human-generated code because when code gets highly nested, the human will naturally be inclined to refactor into more easily formattable code. But for generated code, having the formatter just give up leaves it totally unreadable.

This library is designed using the simplest possible algorithm and data structures that can deliver about 95% of the quality of rustfmt-formatted output. In my experience testing real-world code, approximately 97-98% of output lines come out identical between rustfmt's formatting and this crate's. The rest have slightly different linebreak decisions, but still clearly follow the dominant modern Rust style.

The tradeoffs made by this crate are a good fit for generated code that you will not spend your career staring at. For example, the output of bindgen, or the output of cargo-expand. In those cases it's more important that the whole thing be formattable without the formatter giving up, than that it be flawless.


Feature matrix

Here are a few superficial comparisons of this crate against the AST pretty-printer built into rustc, and rustfmt. The sections below go into more detail comparing the output of each of these libraries.

prettyplease rustc rustfmt
non-pathological behavior on big or generated code ๐Ÿ’š โŒ โŒ
idiomatic modern formatting ("locally indistinguishable from rustfmt") ๐Ÿ’š โŒ ๐Ÿ’š
throughput 60 MB/s 39 MB/s 2.8 MB/s
number of dependencies 3 72 66
compile time including dependencies 2.4 sec 23.1 sec 29.8 sec
buildable using a stable Rust compiler ๐Ÿ’š โŒ โŒ
published to crates.io ๐Ÿ’š โŒ โŒ
extensively configurable output โŒ โŒ ๐Ÿ’š
intended to accommodate hand-maintained source code โŒ โŒ ๐Ÿ’š

Comparison to rustfmt

If you weren't told which output file is which, it would be practically impossible to tell โ€” except for line 435 in the rustfmt output, which is more than 1000 characters long because rustfmt just gave up formatting that part of the file:

            match segments[5] {
                0 => write!(f, "::{}", ipv4),
                0xffff => write!(f, "::ffff:{}", ipv4),
                _ => unreachable!(),
            }
        } else { # [derive (Copy , Clone , Default)] struct Span { start : usize , len : usize , } let zeroes = { let mut longest = Span :: default () ; let mut current = Span :: default () ; for (i , & segment) in segments . iter () . enumerate () { if segment == 0 { if current . len == 0 { current . start = i ; } current . len += 1 ; if current . len > longest . len { longest = current ; } } else { current = Span :: default () ; } } longest } ; # [doc = " Write a colon-separated part of the address"] # [inline] fn fmt_subslice (f : & mut fmt :: Formatter < '_ > , chunk : & [u16]) -> fmt :: Result { if let Some ((first , tail)) = chunk . split_first () { write ! (f , "{:x}" , first) ? ; for segment in tail { f . write_char (':') ? ; write ! (f , "{:x}" , segment) ? ; } } Ok (()) } if zeroes . len > 1 { fmt_subslice (f , & segments [.. zeroes . start]) ? ; f . write_str ("::") ? ; fmt_subslice (f , & segments [zeroes . start + zeroes . len ..]) } else { fmt_subslice (f , & segments) } }
    } else {
        const IPV6_BUF_LEN: usize = (4 * 8) + 7;
        let mut buf = [0u8; IPV6_BUF_LEN];
        let mut buf_slice = &mut buf[..];

This is a pretty typical manifestation of rustfmt bailing out in generated code โ€” a chunk of the input ends up on one line. The other manifestation is that you're working on some code, running rustfmt on save like a conscientious developer, but after a while notice it isn't doing anything. You introduce an intentional formatting issue, like a stray indent or semicolon, and run rustfmt to check your suspicion. Nope, it doesn't get cleaned up โ€” rustfmt is just not formatting the part of the file you are working on.

The prettyplease library is designed to have no pathological cases that force a bail out; the entire input you give it will get formatted in some "good enough" form.

Separately, rustfmt can be problematic to integrate into projects. It's written using rustc's internal syntax tree, so it can't be built by a stable compiler. Its releases are not regularly published to crates.io, so in Cargo builds you'd need to depend on it as a git dependency, which precludes publishing your crate to crates.io also. You can shell out to a rustfmt binary, but that'll be whatever rustfmt version is installed on each developer's system (if any), which can lead to spurious diffs in checked-in generated code formatted by different versions. In contrast prettyplease is designed to be easy to pull in as a library, and compiles fast.


Comparison to rustc_ast_pretty

This is the pretty-printer that gets used when rustc prints source code, such as rustc -Zunpretty=expanded. It's used also by the standard library's stringify! when stringifying an interpolated macro_rules AST fragment, like an $:expr, and transitively by dbg! and many macros in the ecosystem.

Rustc's formatting is mostly okay, but does not hew closely to the dominant contemporary style of Rust formatting. Some things wouldn't ever be written on one line, like this match expression, and certainly not with a comma in front of the closing brace:

fn eq(&self, other: &IpAddr) -> bool {
    match other { IpAddr::V4(v4) => self == v4, IpAddr::V6(_) => false, }
}

Some places use non-multiple-of-4 indentation, which is definitely not the norm:

pub const fn to_ipv6_mapped(&self) -> Ipv6Addr {
    let [a, b, c, d] = self.octets();
    Ipv6Addr{inner:
                 c::in6_addr{s6_addr:
                                 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF,
                                  0xFF, a, b, c, d],},}
}

And although there isn't an egregious example of it in the link because the input code is pretty tame, in general rustc_ast_pretty has pathological behavior on generated code. It has a tendency to use excessive horizontal indentation and rapidly run out of width:

::std::io::_print(::core::fmt::Arguments::new_v1(&[""],
                                                 &match (&msg,) {
                                                      _args =>
                                                      [::core::fmt::ArgumentV1::new(_args.0,
                                                                                    ::core::fmt::Display::fmt)],
                                                  }));

The snippets above are clearly different from modern rustfmt style. In contrast, prettyplease is designed to have output that is practically indistinguishable from rustfmt-formatted code.


Example

// [dependencies]
// prettyplease = "0.2"
// syn = { version = "2", default-features = false, features = ["full", "parsing"] }

const INPUT: &str = stringify! {
    use crate::{
          lazy::{Lazy, SyncLazy, SyncOnceCell}, panic,
        sync::{ atomic::{AtomicUsize, Ordering::SeqCst},
            mpsc::channel, Mutex, },
      thread,
    };
    impl<T, U> Into<U> for T where U: From<T> {
        fn into(self) -> U { U::from(self) }
    }
};

fn main() {
    let syntax_tree = syn::parse_file(INPUT).unwrap();
    let formatted = prettyplease::unparse(&syntax_tree);
    print!("{}", formatted);
}

Algorithm notes

The approach and terminology used in the implementation are derived from Derek C. Oppen, "Pretty Printing" (1979), on which rustc_ast_pretty is also based, and from rustc_ast_pretty's implementation written by Graydon Hoare in 2011 (and modernized over the years by dozens of volunteer maintainers).

The paper describes two language-agnostic interacting procedures Scan() and Print(). Language-specific code decomposes an input data structure into a stream of string and break tokens, and begin and end tokens for grouping. Each beginโ€“end range may be identified as either "consistent breaking" or "inconsistent breaking". If a group is consistently breaking, then if the whole contents do not fit on the line, every break token in the group will receive a linebreak. This is appropriate, for example, for Rust struct literals, or arguments of a function call. If a group is inconsistently breaking, then the string tokens in the group are greedily placed on the line until out of space, and linebroken only at those break tokens for which the next string would not fit. For example, this is appropriate for the contents of a braced use statement in Rust.

Scan's job is to efficiently accumulate sizing information about groups and breaks. For every begin token we compute the distance to the matched end token, and for every break we compute the distance to the next break. The algorithm uses a ringbuffer to hold tokens whose size is not yet ascertained. The maximum size of the ringbuffer is bounded by the target line length and does not grow indefinitely, regardless of deep nesting in the input stream. That's because once a group is sufficiently big, the precise size can no longer make a difference to linebreak decisions and we can effectively treat it as "infinity".

Print's job is to use the sizing information to efficiently assign a "broken" or "not broken" status to every begin token. At that point the output is easily constructed by concatenating string tokens and breaking at break tokens contained within a broken group.

Leveraging these primitives (i.e. cleverly placing the all-or-nothing consistent breaks and greedy inconsistent breaks) to yield rustfmt-compatible formatting for all of Rust's syntax tree nodes is a fun challenge.

Here is a visualization of some Rust tokens fed into the pretty printing algorithm. Consistently breaking beginโ€”end pairs are represented by ยซโ ยป, inconsistently breaking by โ€นโ โ€บ, break by ยท, and the rest of the non-whitespace are string.

use crate::ยซ{ยท
โ€น    lazy::ยซ{ยทโ€นLazy,ยท SyncLazy,ยท SyncOnceCellโ€บยท}ยป,ยท
    panic,ยท
    sync::ยซ{ยท
โ€น        atomic::ยซ{ยทโ€นAtomicUsize,ยท Ordering::SeqCstโ€บยท}ยป,ยท
        mpsc::channel,ยท Mutexโ€บ,ยท
    }ยป,ยท
    threadโ€บ,ยท
}ยป;ยท
ยซโ€นยซimpl<ยซยทTโ€นโ€บ,ยท Uโ€นโ€บยทยป>ยป Into<ยซยทUยทยป>ยท for Tโ€บยท
whereยท
    U:โ€น From<ยซยทTยทยป>โ€บ,ยท
{ยท
ยซ    fn into(ยทยซยทselfยทยป) -> U {ยท
โ€น        U::from(ยซยทselfยทยป)โ€บยท
ยป    }ยท
ยป}ยท

The algorithm described in the paper is not quite sufficient for producing well-formatted Rust code that is locally indistinguishable from rustfmt's style. The reason is that in the paper, the complete non-whitespace contents are assumed to be independent of linebreak decisions, with Scan and Print being only in control of the whitespace (spaces and line breaks). In Rust as idiomatically formattted by rustfmt, that is not the case. Trailing commas are one example; the punctuation is only known after the broken vs non-broken status of the surrounding group is known:

let _ = Struct { x: 0, y: true };

let _ = Struct {
    x: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
    y: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy,   //<- trailing comma if the expression wrapped
};

The formatting of match expressions is another case; we want small arms on the same line as the pattern, and big arms wrapped in a brace. The presence of the brace punctuation, comma, and semicolon are all dependent on whether the arm fits on the line:

match total_nanos.checked_add(entry.nanos as u64) {
    Some(n) => tmp = n,   //<- small arm, inline with comma
    None => {
        total_secs = total_secs
            .checked_add(total_nanos / NANOS_PER_SEC as u64)
            .expect("overflow in iter::sum over durations");
    }   //<- big arm, needs brace added, and also semicolon^
}

The printing algorithm implementation in this crate accommodates all of these situations with conditional punctuation tokens whose selection can be deferred and populated after it's known that the group is or is not broken.


License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

More Repositories

1

cxx

Safe interop between Rust and C++
Rust
5,106
star
2

anyhow

Flexible concrete Error type built on std::error::Error
Rust
4,193
star
3

thiserror

derive(Error) for struct and enum error types
Rust
3,352
star
4

proc-macro-workshop

Learn to write Rust procedural macrosโ€ƒโ€ƒ[Rust Latam conference, Montevideo Uruguay, March 2019]
Rust
2,988
star
5

syn

Parser for Rust source code
Rust
2,681
star
6

cargo-expand

Subcommand to show result of macro expansion
Rust
2,449
star
7

async-trait

Type erasure for async trait methods
Rust
1,495
star
8

case-studies

Analysis of various tricky Rust code
Rust
1,340
star
9

rust-quiz

Medium to hard Rust questions with explanations
Rust
1,318
star
10

quote

Rust quasi-quoting
Rust
1,231
star
11

watt

Runtime for executing procedural macros as WebAssembly
Rust
1,062
star
12

typetag

Serde serializable and deserializable trait objects
Rust
888
star
13

paste

Macros for all your token pasting needs
Rust
852
star
14

serde-yaml

Strongly typed YAML library for Rust
Rust
804
star
15

no-panic

Attribute macro to require that the compiler prove a function can't ever panic
Rust
758
star
16

inventory

Typed distributed plugin registration
Rust
714
star
17

rust-toolchain

Concise GitHub Action for installing a Rust toolchain
Shell
621
star
18

trybuild

Test harness for ui tests of compiler diagnostics
Rust
615
star
19

miniserde

Data structure serialization library with several opposite design goals from Serde
Rust
612
star
20

reflect

Compile-time reflection API for developing robust procedural macros (proof of concept)
Rust
602
star
21

request-for-implementation

Crates that don't exist, but should
597
star
22

indoc

Indented document literals for Rust
Rust
537
star
23

erased-serde

Type-erased Serialize, Serializer and Deserializer traits
Rust
503
star
24

semver

Parser and evaluator for Cargo's flavor of Semantic Versioning
Rust
500
star
25

dyn-clone

Clone trait that is object-safe
Rust
486
star
26

ryu

Fast floating point to string conversion
Rust
471
star
27

linkme

Safe cross-platform linker shenanigans
Rust
399
star
28

semver-trick

How to avoid complicated coordinated upgrades
Rust
383
star
29

cargo-llvm-lines

Count lines of LLVM IR per generic function
Rust
368
star
30

efg

Conditional compilation using boolean expression syntax, rather than any(), all(), not()
Rust
297
star
31

rust-faq

Frequently Asked Questions ยท The Rust Programming Language
262
star
32

rustversion

Conditional compilation according to rustc compiler version
Rust
256
star
33

itoa

Fast function for printing integer primitives to a decimal string
Rust
248
star
34

path-to-error

Find out path at which a deserialization error occurred
Rust
241
star
35

cargo-tally

Graph the number of crates that depend on your crate over time
Rust
212
star
36

proc-macro-hack

Procedural macros in expression position
Rust
203
star
37

monostate

Type that deserializes only from one specific value
Rust
194
star
38

colorous

Color schemes for charts and maps
Rust
193
star
39

readonly

Struct fields that are made read-only accessible to other modules
Rust
187
star
40

dissimilar

Diff library with semantic cleanup, based on Google's diff-match-patch
Rust
175
star
41

star-history

Graph history of GitHub stars of a user or repo over time
Rust
156
star
42

ref-cast

Safely cast &T to &U where the struct U contains a single field of type T.
Rust
154
star
43

automod

Pull in every source file in a directory as a module
Rust
129
star
44

inherent

Make trait methods callable without the trait in scope
Rust
128
star
45

ghost

Define your own PhantomData
Rust
115
star
46

faketty

Wrapper to exec a command in a pty, even if redirecting the output
Rust
113
star
47

dtoa

Fast functions for printing floating-point primitives to a decimal string
Rust
110
star
48

clang-ast

Rust
108
star
49

seq-macro

Macro to repeat sequentially indexed copies of a fragment of code
Rust
102
star
50

remain

Compile-time checks that an enum or match is written in sorted order
Rust
99
star
51

mashup

Concatenate identifiers in a macro invocation
Rust
96
star
52

noisy-clippy

Rust
84
star
53

tt-call

Token tree calling convention
Rust
77
star
54

basic-toml

Minimal TOML library with few dependencies
Rust
76
star
55

squatternaut

A snapshot of name squatting on crates.io
Rust
73
star
56

serde-ignored

Find out about keys that are ignored when deserializing data
Rust
68
star
57

enumn

Convert number to enum
Rust
66
star
58

bootstrap

Bootstrapping rustc from source
Shell
62
star
59

essay

docs.rs as a publishing platform?
Rust
62
star
60

db-dump

Library for scripting analyses against crates.io's database dumps
Rust
60
star
61

scratch

Compile-time temporary directory shared by multiple crates and erased by `cargo clean`
Rust
59
star
62

gflags

Command line flags library that does not require a central list of all the flags
Rust
55
star
63

install

Fast `cargo install` action using a GitHub-based binary cache
Shell
55
star
64

oqueue

Non-interleaving multithreaded output queue
Rust
53
star
65

serde-starlark

Serde serializer for generating Starlark build targets
Rust
53
star
66

build-alert

Rust
51
star
67

unicode-ident

Determine whether characters have the XID_Start or XID_Continue properties
Rust
51
star
68

lalrproc

Proof of concept of procedural macro input parsed by LALRPOP
Rust
50
star
69

dragonbox

Rust
50
star
70

sha1dir

Checksum of a directory tree
Rust
38
star
71

hackfn

Fake implementation of `std::ops::Fn` for user-defined data types
Rust
38
star
72

reduce

iter.reduce(fn) in Rust
Rust
37
star
73

link-cplusplus

Link libstdc++ or libc++ automatically or manually
Rust
36
star
74

argv

Non-allocating iterator over command line arguments
Rust
33
star
75

get-all-crates

Download .crate files of all versions of all crates from crates.io
Rust
31
star
76

threadbound

Make any value Sync but only available on its original thread
Rust
31
star
77

dircnt

Count directory entriesโ€”`ls | wc -l` but faster
Rust
27
star
78

unsafe-libyaml

libyaml transpiled to rust by c2rust
Rust
27
star
79

serde-stacker

Serializer and Deserializer adapters that avoid stack overflows by dynamically growing the stack
Rust
27
star
80

cargo-unlock

Remove Cargo.lock lockfile
Rust
25
star
81

respan

Macros to erase scope information from tokens
Rust
24
star
82

isatty

libc::isatty that also works on Windows
Rust
21
star
83

iota

Related constants in Rust: 1 << iota
Rust
20
star
84

foreach

18
star
85

bufsize

bytes::BufMut implementation to count buffer size
Rust
18
star
86

hire

How to hire dtolnay
18
star
87

precise

Full precision decimal representation of f64
Rust
17
star
88

dashboard

15
star
89

rustflags

Parser for CARGO_ENCODED_RUSTFLAGS
Rust
13
star
90

libfyaml-rs

Rust binding for libfyaml
Rust
11
star
91

install-buck2

Install precompiled Buck2 build system
6
star
92

mailingset

Set-algebraic operations on mailing lists
Python
5
star
93

.github

5
star
94

jq-gdb

gdb pretty-printer for jv objects
Python
1
star