• Stars
    star
    136
  • Rank 258,583 (Top 6 %)
  • Language
    Haskell
  • License
    MIT License
  • Created about 3 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast parsing from bytestrings

flatparse

Hackage CI

flatparse is a high-performance parsing library, supporting parsing for programming languages, human-readable data and machine-readable data. The "flat" in the name refers to the ByteString parsing input, which has pinned contiguous data, and also to the library internals, which avoids indirections and heap allocations whenever possible. flatparse is generally lower-level than parsec-style libraries, but it is possible to build higher-level features (such as source spans, hints, indentation parsing) on top of it, without making any compromises in performance.

LLVM

It is advised to build with -fllvm option when using this package, since that can result in significant speedups (20-40% from what I've seen). Additionally, you can enable -fllvm for flatparse specifically by enabling the llvm package flag. However, this has minor impact, since almost all parser code will be typically inlined into modules outside flatparse, and compiled there.

Features and non-features

  • Excellent performance. On microbenchmarks, flatparse is around 10 times faster than attoparsec or megaparsec. On larger examples with heavier use of source positions and spans and/or indentation parsing, the performance difference grows to 20-30 times. Compile times and executable sizes are also significantly better with flatparse than with megaparsec or attoparsec. flatparse internals make liberal use of unboxed tuples and GHC primops. As a result, pure validators (parsers returning ()) in flatparse are not difficult to implement with zero heap allocation.
  • No incremental parsing, and only strict ByteString is supported as input. However, it can be still useful to convert from Text, String or other types to ByteString, and then use flatparse for parsing, since flatparse performance usually more than makes up for the conversion costs.
  • Only little-endian 64 bit systems are currently supported as the host machine. This may change in the future. Getting good performance requires architecture-specific optimizations; I've only considered the most common setting at this point. However, flatparse does include primitive integer parsers with specific endianness.
  • Support for fast source location handling, indentation parsing and informative error messages. flatparse provides a low-level interface to these. Batteries are not included, but it should be possible for users to build custom solutions, which are more sophisticated, but still as fast as possible. In my experience, the included batteries in other libraries often come with major unavoidable overheads, and often we still have to extend existing machinery in order to scale to production features.
  • The backtracking model of flatparse is different to parsec libraries, and is more close to the nom library in Rust. The idea is that parser failure is distinguished from parsing error. The former is used for control flow, and we can backtrack from it. The latter is used for unrecoverable errors, and by default it's propagated to the top. flatparse does not track whether parsers have consumed inputs. In my experience, what we really care about is the failure/error distinction, and in parsec or megaparsec the consumed/non-consumed separation is often muddled and discarded in larger parser implementations. By default, basic flatparse parsers can fail but can not throw errors, with the exception of the specifically error-throwing operations. Hence, flatparse users have to be mindful about grammar, and explicitly insert errors where it is known that the input can't be valid.

flatparse comes in two flavors: FlatParse.Basic and FlatParse.Stateful. Both support a custom error type. Also, both come in three modes, where we can respectively run IO actions, ST actions, or no side effects. The modes are selected by a state token type parameter on the parser types.

  • FlatParse.Basic only supports the above features. If you don't need indentation parsing, this is sufficient.
  • FlatParse.Stateful additionally supports a built-in Int worth of internal state and an additional custom reader environment. This can support a wide range of indentation parsing features. There is a slight overhead in performance and code size compared to Basic. However, in small parsers and microbenchmarks the difference between Basic and Stateful is often reduced to near zero by GHC and/or LLVM optimization.

Tutorial

Informative tutorials are work in progress. See src/FlatParse/Examples for a lexer/parser example with acceptably good error messages.

Contribution

Pull requests are welcome. I'm fairly quick to add PR authors as collaborators.

Some benchmarks

Execution times below. See source code in bench. Compiled with GHC 9.4.4 -O2 -fllvm. Executed on Intel 1165G7 CPU at 28W power draw. Uses nightly-2023-02-06 Stackage snapshot for the involved packages.

benchmark runtime
sexp/fpbasic 1.93 ms
sexp/fpstateful 2.00 ms
sexp/attoparsec 21.82 ms
sexp/megaparsec 59.60 ms
sexp/parsec 79.81 ms
long keyword/fpbasic 96.95 μs
long keyword/fpstateful 95.30 μs
long keyword/attoparsec 2.43 ms
long keyword/megaparsec 5.196 ms
long keyword/parsec 10.02 ms
numeral csv/fpbasic 715.2 μs
numeral csv/fpstateful 555.0 μs
numeral csv/attoparsec 10.52 ms
numeral csv/megaparsec 19.77 ms
numeral csv/parsec 26.46 ms

Object file sizes for each module containing the s-exp, long keyword and numeral csv benchmarks.

library object file size (bytes)
fpbasic 20656
fpstateful 26664
attoparsec 69384
megaparsec 226232
parsec 117696

More Repositories

1

elaboration-zoo

Minimal implementations for dependent type checking and elaboration
Haskell
550
star
2

smalltt

Demo for high-performance type theory elaboration
Lean
489
star
3

staged

Staged compilation with dependent types
TeX
119
star
4

setoidtt

Prototype implementations of systems based on setoid type theory
Haskell
64
star
5

cctt

high-performance cubical evaluation
TeX
58
star
6

staged-fusion

Staged push/pull fusion with typed Template Haskell
Haskell
55
star
7

system-f-omega

System F-omega normalization by hereditary substitution in Agda
Agda
54
star
8

normalization-bench

Lambda normalization and conversion checking benchmarks for various implementations
Haskell
51
star
9

implicit-fun-elaboration

Implementation for ICFP 2020 paper
TeX
48
star
10

sett

Setoid type theory implementation
Haskell
38
star
11

polynomial-model

A polynomial model of a Martin-Löf type theory + a bit of game semantics
Agda
29
star
12

universes

Generalized syntax & semantics for universe hierarchies
TeX
28
star
13

thesis

my phd thesis
TeX
26
star
14

stlc-nbe

Correctness of normalization-by-evaluation for STLC
Agda
21
star
15

misc-stuff

Code for tutorials, papers and experiments. Mostly Agda, Coq and Haskell.
Agda
19
star
16

flat-maybe

Rust-style strict Maybe in Haskell: no space/indirection overhead.
Haskell
16
star
17

preordertt

Experiments with preordered set models of (directed) type theories
Agda
15
star
18

SemanticsWithApplications

Formal semantics in Agda.
Agda
14
star
19

pny1-assignment

College assignment writing in which I ramble about type classes and dependent types.
Agda
12
star
20

qiit-generalizations

Extending small finitary QIITs to non-small non-finitary
TeX
9
star
21

dawg

Generation and traversal of highly compressed directed acyclic word graphs.
Haskell
7
star
22

singleton-nats

Unary natural numbers relying on the singletons infrastructure
Haskell
7
star
23

trie-vector

Clojure-style persistent vector for Haskell.
Haskell
6
star
24

BasicHs

Basic Haskell ("funkcionális programozás gyakorlat") lecture notes
Haskell
4
star
25

HaladoHs

2018 Spring "Haladó Haskell" ("Advanced Haskell") course materials (in Hungarian)
Haskell
4
star
26

array-primops

Extra foreign primops for primitive arrays.
Haskell
4
star
27

primdata

Haskell
3
star
28

ind-ind-types

Materials for TYPES 2019 submissions about inductive-inductive types
TeX
3
star
29

dynamic-array

Haskell
3
star
30

dynamic-mvector

A wrapper around MVector that enables push/pop/append functionality.
Haskell
3
star
31

AndrasKovacs.github.io

My github page
HTML
2
star
32

func-lang-2019-1

Funkcionális nyelvek (IPM-18sztKVFPNYEG), 2019 tavasz, EA + GY
Haskell
2
star
33

dawg-gen

Creates highly compressed directed acyclic word graphs from word lists.
Python
2
star
34

elements-of-compsys

Projects for the book "The Elements of Computing Systems"
Assembly
1
star
35

glue

Glued models of type theory, using internal languages
Agda
1
star
36

scrabble-bot

Scrabble move generation
Haskell
1
star
37

elte-cbsd

"Component based software development" course assignments.
Haskell
1
star
38

bi-zipper

Simple bidirectional zipper from any Traversable.
Haskell
1
star
39

sysF-NbE

Normalization by evaluation for intrinsic System F
Agda
1
star
40

haskell-performance

Miscellaneous Haskell benchmarks
Haskell
1
star
41

bf

Fast Haskell brainfuck interpreter
Brainfuck
1
star
42

while-hs

Translation of an undergrad course compiler to Haskell.
Haskell
1
star