• Stars
    star
    167
  • Rank 226,635 (Top 5 %)
  • Language
    Clojure
  • Created over 11 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clojure bindings for the ANTLR 4 parser

Clj-Antlr

Clojure bindings for the ANTLR 4 parser library, an adaptive LL(*) parser. Looks a lot like Instaparse, only much faster, with richer grammar definitions, and none of Instaparse's delightful features.

Installation

Just add clj-antlr to your project.clj, and load a grammar file at runtime.

No ANTLR installation is required; clj-antlr will load the grammar for you, no compilation needed. No macros, either! Running the parser in interpreted mode is a tad slower than the compiled parsers that Antlr can emit, but means a lot less hassle for folks to get started.

Usage

user=> (require ['clj-antlr.core :as 'antlr])
nil
user=> (def json (antlr/parser "grammars/Json.g4"))
#'user/json
user=> (pprint (json "[1,2,3]"))
(:jsonText
 (:jsonArray
  "["
  (:jsonValue (:jsonNumber "1"))
  ","
  (:jsonValue (:jsonNumber "2"))
  ","
  (:jsonValue (:jsonNumber "3"))
  "]"))

Parsers act like functions, and can take strings, InputStreams, and Readers as their arguments. They emit trees of lists: each list begins with the keyword node name, and is followed by the nodes' children. Terminal nodes are represented as strings.

You can define parsers directly with strings, too. ANTLR 4.2 will complain but compile successfully; 4.2.1 will include a patch to fix that.

user=> (def aaa (antlr/parser "grammar Aaa;
  #_=>                         aaa : AA+;
  #_=>                         AA : [Aa]+ ;
  #_=>                         WS : ' ' -> channel(HIDDEN) ;"))
#'user/aaa
user=> (aaa "aAAaa A aAA AAAAaAA")
(:aaa "aAAaa" "A" "aAA" "AAAAaAA")

Works with split (lexer and parser) grammars:

  clj::clj-antlr.core-test=> (let [g (parser "grammars/L.g4" "grammars/T.g4" {})] (g "abbc"))
  (:s "a" "b" "b" "c")

Errors

ANTLR can recover from errors in mid-parse by performing single-token insertion and single-token deletion on mismatched error tokens, where possible. This means an ANTLR parse may throw an error, but still produce useful parse information; or produce multiple errors. Parsing an invalid string will throw an exception with a textual explanation of the errors encountered:

user=> (json "[1,2,,3,]")

ParseError extraneous input ',' expecting {'null', '{', '[', 'false', 'true', NUMBER, STRING}
mismatched input ']' expecting {'null', '{', '[', 'false', 'true', NUMBER, STRING}  clj-antlr.common/parse-error (common.clj:106)

But wait, there's more! ParseErrors are deref-able, yielding detailed debugging information:

user=> (try (json "[1,2,,3,]") (catch clj_antlr.ParseError e (pprint @e)))
({:symbol #<CommonToken [@5,5:5=',',<4>,1:5]>,
  :line 1,
  :char 5,
  :message
  "extraneous input ',' expecting {'null', '{', '[', 'false', 'true', NUMBER, STRING}"}
 {:token #<CommonToken [@8,8:8=']',<1>,1:8]>,
  :expected #<IntervalSet {2..3, 5, 7, 9..10, 12}>,
  :state 25,
  :rule #<InterpreterRuleContext [51 15]>,
  :stack ("jsonText" "jsonArray" "jsonValue"),
  :symbol #<CommonToken [@8,8:8=']',<1>,1:8]>,
  :line 1,
  :char 8,
  :message
  "mismatched input ']' expecting {'null', '{', '[', 'false', 'true', NUMBER, STRING}"})

You can use the line and char numbers, in addition to the messages, to guide the user in generating correct syntax. Clj-antlr handles both lexer and parser errors; though the debugging information available at different passes may vary.

user=> (try (json "⊂") (catch clj_antlr.ParseError e (pprint @e)))
({:token nil,
  :expected nil,
  :state -1,
  :rule nil,
  :symbol nil,
  :line 1,
  :char 0,
  :message "token recognition error at: '⊂'"}
 {:token #<CommonToken [@0,1:0='<EOF>',<-1>,1:1]>,
  :expected #<IntervalSet {3, 5}>,
  :state 16,
  :rule #<InterpreterRuleContext []>,
  :symbol #<CommonToken [@0,1:0='<EOF>',<-1>,1:1]>,
  :line 1,
  :char 1,
  :message "no viable alternative at input '<EOF>'"})

clj-antlr will still produce parse trees from invalid input. Use the {:throw? false} option, either when constructing the parser, or as an argument to the parse function.

user=> (->> "[1,2" (antlr/parse json {:throw? false}) pprint)
(:jsonText
 (:jsonArray
  "["
  (:jsonValue (:jsonNumber "1"))
  ","
  (:jsonValue (:jsonNumber "2"))))

Any parse errors will be available as metadata on the returned tree:

user=> (->> "[1,2" (antlr/parse json {:throw? false}) meta :errors pprint)
({:token #<CommonToken [@4,4:3='<EOF>',<-1>,1:4]>,
  :expected #<IntervalSet {1, 4}>,
  :state 54,
  :rule #<InterpreterRuleContext [15]>,
  :symbol #<CommonToken [@4,4:3='<EOF>',<-1>,1:4]>,
  :line 1,
  :char 4,
  :message "no viable alternative at input '<EOF>'"})

Sometimes, clj-antlr is able to identify invalid rules in the parse tree, and wrap them with a :clj-antlr/error node.

user=> (->> "[1, {\"foo\"::}, 3]" (antlr/parse json {:throw? false}) pprint)
(:jsonText
 (:jsonArray
  "["
  (:jsonValue (:jsonNumber "1"))
  ","
  (:jsonValue
   (:jsonObject
    "{"
    (:member "\"foo\"" ":" (:clj-antlr/error (:jsonValue ":")))
    "}"))
  ","
  (:jsonValue (:jsonNumber "3"))
  "]"))

But not always. This input generates errors in the top-level :errors metadata map, but creates an invalid parse tree without any error nodes. I think this is a bug in clj-antlr or ANTLR itself; if you have suggestions, I'd like to hear them.

user=> (->> "[1,,3]" (antlr/parse json {:throw? false}) pprint)
(:jsonText
 (:jsonArray
  "["
  (:jsonValue (:jsonNumber "1"))
  ","
  (:jsonValue "," (:jsonNumber "3"))
  "]"))

Options

All options may be passed at parser construction time:

user=> (antlr/parse (antlr/parser "grammars/Cadr.g4" {:case-sensitive? false})
                    "CdDr")
(:cadr "C" "d" "D" "r")

... and also overridden at parse time via antlr.core/parse:

user=> (antlr/parse (antlr/parser "grammars/Cadr.g4")
                    {:case-sensitive? false}
                    "CdDr")
(:cadr "C" "d" "D" "r")
user=> (doc antlr/parser)
-------------------------
clj-antlr.core/parser
([filename] [filename opts])
  Constructs a new parser. Takes a filename for an Antlr v4 grammar. Options:

  :format           The parse tree to generate. One of
                      :sexp (default)  Nested lists, node names first
                      :raw             Equivalent to identity
                      <any function>   Takes a map of {:tree, :parser, etc}

  :root             The string name of the rule to begin parsing. Defaults to
                    the first rule in the grammar.

  :throw?           If truthy, parse errors will be thrown. Defaults true.

  :case-sensitive?  Whether the lexer must match the exact case of characters.
                    Defaults true. If false, the tokenizer will only receive
                    lowercase characters. The generated parse tree will still
                    retain the case of the original text.

  :use-alternates?  If truthy, uses the alternate name for a node, rather than
                    the rule name. Defaults false.

Where can I find grammars?

Here's a ton of ANTLR 4 parsers for various languages!

Faster?

On a real-world 3.5KB JSON object, clj-antlr with a typical JSON grammar is about 100 times faster than an identical AST built by an Instaparse grammar. Since Instaparse doesn't really have a separation between grammar and lexer rules, I'm using regular expressions for strings, ints, etc; but the transformation between grammars is pretty straightforward.

kingsbury@hackbook:~/clj-antlr master$ lein test :perf
Benchmarking instaparse
WARNING: Final GC required 1.645508727049022 % of runtime
Evaluation count : 660 in 60 samples of 11 calls.
             Execution time mean : 97.557634 ms
    Execution time std-deviation : 5.132833 ms
   Execution time lower quantile : 91.651987 ms ( 2.5%)
   Execution time upper quantile : 108.289375 ms (97.5%)
                   Overhead used : 10.328888 ns

Found 4 outliers in 60 samples (6.6667 %)
  low-severe   3 (5.0000 %)
  low-mild   1 (1.6667 %)
 Variance from outliers : 38.4948 % Variance is moderately inflated by outliers



Benchmarking clj-antlr
Evaluation count : 64440 in 60 samples of 1074 calls.
             Execution time mean : 958.366202 µs
    Execution time std-deviation : 36.434070 µs
   Execution time lower quantile : 901.210266 µs ( 2.5%)
   Execution time upper quantile : 1.032678 ms (97.5%)
                   Overhead used : 10.328888 ns

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

License

Copyright © 2014 Kyle Kingsbury [email protected], and Factual, Inc. Includes ANTLR code under the BSD 3-clause license, written by Terence Parr and Sam Harwell. My sincerest appreciation to all ANTLR contributors as well. :)

Distributed under the Eclipse Public License, the same as Clojure.

More Repositories

1

distsys-class

Class materials for a distributed systems lecture series
8,983
star
2

tesser

Clojure reducers, but for parallel execution: locally and on distributed systems.
Clojure
867
star
3

meangirls

Convergent Replicated Data Types
Ruby
650
star
4

tund

SSH reverse tunnel daemon
Ruby
418
star
5

tea-time

Lightweight Clojure task scheduler
Clojure
240
star
6

salticid

A deployment system, with design goals 1: Magic and 2: More Magic
Ruby
222
star
7

dom-top

Unorthodox control flow, for Clojurists with masochistic sensibilities.
Clojure
204
star
8

timelike

A library for simulating parallel systems, in Clojure
Clojure
184
star
9

partitions-post

A blog post on network partitions in practice
182
star
10

less-awful-ssl

Sssh no tears, only TLS now. For Clojure.
Clojure
154
star
11

interval-metrics

Clojure data structures for performance metrics over discrete time intervals.
Clojure
118
star
12

dist-sagas

A paper on sagas in distributed systems
TeX
91
star
13

gretchen

Offline serializability verification, in Clojure
Clojure
68
star
14

prism

Automatically re-run clojure tests
Clojure
59
star
15

verschlimmbesserung

An etcd client with modern Clojure sensibilities
Clojure
56
star
16

merkle

Clojure Merkle Trees
Clojure
51
star
17

gnuplot

Clojure gnuplot bindings
Clojure
47
star
18

risky

A lightweight Ruby ORM for Riak
Ruby
40
star
19

schadenfreude

Clojure benchmarking tools
Clojure
35
star
20

jepsen-talks

Slides and resources for talks on partition tolerance
Clojure
33
star
21

meitner

Explodes Clojure functions and macros into dependency graphs
Clojure
30
star
22

aesahaettr

Sharding, partitioning, and consistent hashing for Clojure. May release spectres.
Clojure
26
star
23

bitcask-ruby

An (incomplete) interface to the Bitcask storage system
Ruby
19
star
24

ustate

micro state daemon
Ruby
18
star
25

salesfear

A Clojure salesforce client.
Clojure
17
star
26

construct

Extensible, persistent, structured configuration for Ruby
Ruby
16
star
27

skewbinheap

A Skew Binomial Heap for Erlang.
Erlang
15
star
28

yamr

A Linux Yammer client.
Ruby
12
star
29

bifurcan-clj

Clojure wrapper for the Bifurcan family of data structures
Clojure
12
star
30

tumblr-archiver

Hacky Clojure program to download media from tumblr liked posts
Clojure
12
star
31

cyclic.js

Cyclic time series data structures for javascript
JavaScript
10
star
32

riemann-bench

An example for using the Reimann Clojure client
Clojure
10
star
33

london-gen

Silly London Landmarks
Clojure
8
star
34

gifdex

A gif tagging server for local use
Clojure
8
star
35

mtrc

Ruby metrics
Ruby
8
star
36

adaptive-executor

Adaptive threadpool executor experiment
Clojure
6
star
37

lights

Change your Hue lights to randomly generated colors, continuously
Clojure
5
star
38

thought-leaders

The definitive list of thought leaders
5
star
39

mastodon-utils

Utilities for working with Mastodon's API
Clojure
5
star
40

prometheus-mastodon-exporter

Exports Mastodon statistics for polling by Prometheus
Clojure
5
star
41

cortex-reaver

A dangerous Ruby blog engine, with a photographic memory.
Ruby
5
star
42

hangman

A ridiculously overpowered hangman AI in Clojure
Clojure
4
star
43

qsd-phase-space-reconstruction

Some ruby scripts for exploring datasets in an attempt to reconstruct phase space dynamics (and to identify lyapunov exponents) of a QSD-simulated Duffing oscillator
Ruby
4
star
44

exocora

A lightweight CGI script framework
Ruby
3
star
45

joedahato

Predict's Joe Damato's hat choices
Clojure
3
star
46

tattoo

Overkill
Clojure
2
star
47

producer_consumer

Ruby queue-backed producer consumer gem
Ruby
2
star
48

ruby-vodpod

Ruby bindings for the Vodpod API.
Ruby
2
star
49

frisk-management

NLP + erotic fiction -> PowerPoint slides
Clojure
2
star
50

heliotrope

A client for the distributed processing system Fabric
Ruby
2
star
51

qsd-tangent

Fork of QSD library with tangent space evolution
C++
2
star
52

caremad

Consistent Commutative Replicated DataTypes
2
star
53

euler-clj

Project Euler solutions in Clojure
Clojure
1
star
54

req-replay

silly experiment
Clojure
1
star
55

autotags

Jquery javascript tag editor with autocomplete
JavaScript
1
star
56

riakeys

Riak key cache (apparently impossible)
Clojure
1
star
57

clojure-perf

Mucking around with clojure performance testing
Clojure
1
star
58

mecha-query

Clojure
1
star