• Stars
    star
    1,064
  • Rank 43,369 (Top 0.9 %)
  • Language
    C#
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A C# parser construction toolkit with high-quality error reporting

Superpower Build status NuGet Version Stack Overflow

A parser combinator library based on Sprache. Superpower generates friendlier error messages through its support for token-driven parsers.

Logo

What is Superpower?

The job of a parser is to take a sequence of characters as input, and produce a data structure that's easier for a program to analyze, manipulate, or transform. From this point of view, a parser is just a function from string to T - where T might be anything from a simple number, a list of fields in a data format, or the abstract syntax tree of some kind of programming language.

Just like other kinds of functions, parsers can be built by hand, from scratch. This is-or-isn't a lot of fun, depending on the complexity of the parser you need to build (and how you plan to spend your next few dozen nights and weekends).

Superpower is a library for writing parsers in a declarative style that mirrors the structure of the target grammar. Parsers built with Superpower are fast, robust, and report precise and informative errors when invalid input is encountered.

Usage

Superpower is embedded directly into your C# program, without the need for any additional tools or build-time code generation tasks.

dotnet add package Superpower

The simplest text parsers consume characters directly from the source text:

// Parse any number of capital 'A's in a row
var parseA = Character.EqualTo('A').AtLeastOnce();

The Character.EqualTo() method is a built-in parser. The AtLeastOnce() method is a combinator, that builds a more complex parser for a sequence of 'A' characters out of the simple parser for a single 'A'.

Superpower includes a library of simple parsers and combinators from which more sophisticated parsers can be built:

TextParser<string> identifier =
    from first in Character.Letter
    from rest in Character.LetterOrDigit.Or(Character.EqualTo('_')).Many()
    select first + new string(rest);

var id = identifier.Parse("abc123");

Assert.Equal("abc123", id);

Parsers are highly modular, so smaller parsers can be built and tested independently of the larger parsers that use them.

Tokenization

Along with text parsers that consume input character-by-character, Superpower supports token parsers.

A token parser consumes elements from a list of tokens. A token is a fragment of the input text, tagged with the kind of item that fragment represents - usually specified using an enum:

public enum ArithmeticExpressionToken
{
    None,
    Number,
    Plus,

A major benefit of driving parsing from tokens, instead of individual characters, is that errors can be reported in terms of tokens - unexpected identifier `frm`, expected keyword `from` - instead of the cryptic unexpected m.

Token-driven parsing takes place in two distinct steps:

  1. Tokenization, using a class derived from Tokenizer<TKind>, then
  2. Parsing, using a function of type TokenListParser<TKind>.
var expression = "1 * (2 + 3)";

// 1.
var tokenizer = new ArithmeticExpressionTokenizer();
var tokenList = tokenizer.Tokenize(expression);

// 2.
var parser = ArithmeticExpressionParser.Lambda; // parser built with combinators
var expressionTree = parser.Parse(tokenList);

// Use the result
var eval = expressionTree.Compile();
Console.WriteLine(eval()); // -> 5

Assembling tokenizers with TokenizerBuilder<TKind>

The job of a tokenizer is to split the input into a list of tokens - numbers, keywords, identifiers, operators - while discarding irrelevant trivia such as whitespace or comments.

Superpower provides the TokenizerBuilder<TKind> class to quickly assemble tokenizers from recognizers, text parsers that match the various kinds of tokens required by the grammar.

A simple arithmetic expression tokenizer is shown below:

var tokenizer = new TokenizerBuilder<ArithmeticExpressionToken>()
    .Ignore(Span.WhiteSpace)
    .Match(Character.EqualTo('+'), ArithmeticExpressionToken.Plus)
    .Match(Character.EqualTo('-'), ArithmeticExpressionToken.Minus)
    .Match(Character.EqualTo('*'), ArithmeticExpressionToken.Times)
    .Match(Character.EqualTo('/'), ArithmeticExpressionToken.Divide)
    .Match(Character.EqualTo('('), ArithmeticExpressionToken.LParen)
    .Match(Character.EqualTo(')'), ArithmeticExpressionToken.RParen)
    .Match(Numerics.Natural, ArithmeticExpressionToken.Number)
    .Build();

Tokenizers constructed this way produce a list of tokens by repeatedly attempting to match recognizers against the input in top-to-bottom order.

Writing tokenizers by hand

Tokenizers can alternatively be written by hand; this can provide the most flexibility, performance, and control, at the expense of more complicated code. A handwritten arithmetic expression tokenizer is included in the test suite, and a more complete example can be found here.

Writing token list parsers

Token parsers are defined in the same manner as text parsers, using combinators to build up more sophisticated parsers out of simpler ones.

class ArithmeticExpressionParser
{
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Add =
        Token.EqualTo(ArithmeticExpressionToken.Plus).Value(ExpressionType.AddChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Subtract =
        Token.EqualTo(ArithmeticExpressionToken.Minus).Value(ExpressionType.SubtractChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Multiply =
        Token.EqualTo(ArithmeticExpressionToken.Times).Value(ExpressionType.MultiplyChecked);
        
    static readonly TokenListParser<ArithmeticExpressionToken, ExpressionType> Divide = 
        Token.EqualTo(ArithmeticExpressionToken.Divide).Value(ExpressionType.Divide);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Constant =
            Token.EqualTo(ArithmeticExpressionToken.Number)
            .Apply(Numerics.IntegerInt32)
            .Select(n => (Expression)Expression.Constant(n));

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Factor =
        (from lparen in Token.EqualTo(ArithmeticExpressionToken.LParen)
            from expr in Parse.Ref(() => Expr)
            from rparen in Token.EqualTo(ArithmeticExpressionToken.RParen)
            select expr)
        .Or(Constant);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Operand =
        (from sign in Token.EqualTo(ArithmeticExpressionToken.Minus)
            from factor in Factor
            select (Expression)Expression.Negate(factor))
        .Or(Factor).Named("expression");

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Term =
        Parse.Chain(Multiply.Or(Divide), Operand, Expression.MakeBinary);

    static readonly TokenListParser<ArithmeticExpressionToken, Expression> Expr =
        Parse.Chain(Add.Or(Subtract), Term, Expression.MakeBinary);

    public static readonly TokenListParser<ArithmeticExpressionToken, Expression<Func<int>>>
        Lambda = Expr.AtEnd().Select(body => Expression.Lambda<Func<int>>(body));
}

Error messages

The error scenario tests demonstrate some of the error message formatting capabilities of Superpower. Check out the parsers referenced in the tests for some examples.

ArithmeticExpressionParser.Lambda.Parse(new ArithmeticExpressionTokenizer().Tokenize("1 + * 3"));
     // -> Syntax error (line 1, column 5): unexpected operator `*`, expected expression.

To improve the error reporting for a particular token type, apply the [Token] attribute:

public enum ArithmeticExpressionToken
{
    None,

    Number,

    [Token(Category = "operator", Example = "+")]
    Plus,

Performance

Superpower is built with performance as a priority. Less frequent backtracking, combined with the avoidance of allocations and indirect dispatch, mean that Superpower can be quite a bit faster than Sprache.

Recent benchmark for parsing a long arithmetic expression:

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Windows
Processor=?, ProcessorCount=8
Frequency=2533306 ticks, Resolution=394.7411 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003121

Type=ArithmeticExpressionBenchmark  Mode=Throughput  
Method Median StdDev Scaled Scaled-SD
Sprache 283.8618 Β΅s 10.0276 Β΅s 1.00 0.00
Superpower (Token) 81.1563 Β΅s 2.8775 Β΅s 0.29 0.01

Benchmarks and results are included in the repository.

Tips: if you find you need more throughput: 1) consider a hand-written tokenizer, and 2) avoid the use of LINQ comprehensions and instead use chained combinators like Then() and especially IgnoreThen() - these allocate fewer delegates (closures) during parsing.

Examples

Superpower is introduced, with a worked example, in this blog post.

Example parsers to learn from:

  • JsonParser is a complete, annotated example implementing the JSON spec with good error reporting
  • DateTimeTextParser shows how Superpower's text parsers work, parsing ISO-8601 date-times
  • IntCalc is a simple arithmetic expresion parser (1 + 2 * 3) included in the repository, demonstrating how Superpower token parsing works
  • Plotty implements an instruction set for a RISC virtual machine
  • tcalc is an example expression language that computes durations (1d / 12m)

Real-world projects built with Superpower:

  • Serilog.Expressions uses Superpower to implement an expression and templating language for structured log events
  • The query language of Seq is implemented using Superpower
  • seqcli extraction patterns use Superpower for plain-text log parsing
  • PromQL.Parser is a parser for the Prometheus Query Language

Have an example we can add to this list? Let us know.

Getting help

Please post issues to the issue tracker, or tag your question on StackOverflow with superpower.

The repository's title arose out of a talk "Parsing Text: the Programming Superpower You Need at Your Fingertips" given at DDD Brisbane 2015.

More Repositories

1

serilog-sinks-seq

A Serilog sink that writes events to the Seq structured log server
C#
185
star
2

dotnet6-serilog-example

A sample project showing Serilog configured in the default .NET 6 web application template
C#
182
star
3

seqcli

The Seq command-line client. Administer, log, ingest, search, from any OS.
C#
112
star
4

clef-tool

A command-line tool for manipulating Compact Log Event Format files
C#
98
star
5

seq-tickets

Issues, design discussions and feature roadmap for the Seq log server
91
star
6

piggy

A friendly PostgreSQL script runner in the spirit of DbUp.
C#
75
star
7

seq-extensions-logging

Add centralized log collection to ASP.NET Core apps with one line of code.
C#
74
star
8

serilog-middleware-example

An example ASP.NET Core app with smart request logging middleware
C#
74
star
9

seq-api

HTTP API client for Seq
C#
71
star
10

squirrel-json

A vectorized JSON parser for pre-validated, minified documents
Rust
70
star
11

seq-cheat-sheets

Cheat sheets for Seq filtering and querying syntax
64
star
12

seq-forwarder

Local collection and reliable forwarding of log data to Seq
C#
52
star
13

seq-app-htmlemail

Plug-in apps that act on event streams in the Seq log server
C#
49
star
14

seq-input-healthcheck

Periodically GET an HTTP resource and write response metrics to Seq
C#
24
star
15

seq-client-log4net

A log4net appender that writes events to Seq
C#
24
star
16

seq-logging

A Node.js client for the Seq HTTP ingestion API
JavaScript
19
star
17

nlog-targets-seq

An NLog target that writes events to Seq. Built for NLog 4.5+.
C#
16
star
18

seq-input-gelf

Ingest GELF payloads into Seq
Rust
14
star
19

bunyan-seq

A Bunyan stream to send events to Seq
JavaScript
11
star
20

seq-import

A CLI tool for importing JSON-formatted log files directly into Seq
C#
10
star
21

winston-seq

A Winston v3 transport for Seq
TypeScript
10
star
22

pino-seq

A stream to send Pino events to Seq
JavaScript
9
star
23

squiflog

Ingest Syslog payloads into Seq
Rust
7
star
24

seq-input-rabbitmq

A Seq custom input that pulls events from RabbitMQ
C#
7
star
25

seq-docker-windows

Windows Dockerfile for Seq
PowerShell
6
star
26

seq-app-jsonarchive

Record events to a set of newline-delimited JSON streams
Rust
6
star
27

seq-app-httprequest

Send events and notifications to an HTTP/REST/WebHook endpoint.
C#
6
star
28

helm.datalust.co

Helm charts hosted on helm.datalust.co
Mustache
4
star
29

seq-client-portable

A portable (WP/iOS/Android) sink for Serilog that writes events over HTTP/S to Seq
C#
3
star
30

seq-app-opsgenie

Create Opsgenie alerts in response to events or notifications in Seq
C#
3
star
31

seq-app-digestemail

Batched HTML email integration
C#
2
star
32

seq-app-replication

Seq.App.Replication - forward incoming events to another Seq server
C#
2
star
33

express-pino-seq

An example Node.js Express app using `pino` logger together with `pino-seq`
JavaScript
2
star
34

seq-apps-runtime

The Seq app hosting interfaces published as the Seq.Apps NuGet package
C#
2
star
35

seq-app-valuelist

An example Seq app that tracks which values appear in a particular event property
C#
1
star
36

seq-app-thresholds

Seq.App.Thresholds
C#
1
star