• Stars
    star
    1,584
  • Rank 29,535 (Top 0.6 %)
  • Language Yacc
  • License
    Other
  • Created over 11 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Ruby parser.

Parser

Gem Version Tests

Parser is a production-ready Ruby parser written in pure Ruby. It recognizes as much or more code than Ripper, Melbourne, JRubyParser or ruby_parser, and is vastly more convenient to use.

You can also use unparser to produce equivalent source code from Parser's ASTs.

Sponsored by Evil Martians. MacRuby and RubyMotion support sponsored by CodeClimate.

Installation

$ gem install parser

Usage

Load Parser (see the backwards compatibility section below for explanation of emit_* calls):

require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true

Parse a chunk of code:

p Parser::CurrentRuby.parse("2 + 2")
# (send
#   (int 2) :+
#   (int 2))

Access the AST's source map:

p Parser::CurrentRuby.parse("2 + 2").loc
# #<Parser::Source::Map::Send:0x007fe5a1ac2388
#   @dot=nil,
#   @begin=nil,
#   @end=nil,
#   @selector=#<Source::Range (string) 2...3>,
#   @expression=#<Source::Range (string) 0...5>>

p Parser::CurrentRuby.parse("2 + 2").loc.selector.source
# "+"

Traverse the AST: see the documentation for gem ast.

Parse a chunk of code and display all diagnostics:

parser = Parser::CurrentRuby.new
parser.diagnostics.consumer = lambda do |diag|
  puts diag.render
end

buffer = Parser::Source::Buffer.new('(string)', source: "foo *bar")

p parser.parse(buffer)
# (string):1:5: warning: `*' interpreted as argument prefix
# foo *bar
#     ^
# (send nil :foo
#   (splat
#     (send nil :bar)))

If you reuse the same parser object for multiple #parse runs, you need to #reset it.

You can also use the ruby-parse utility (it's bundled with the gem) to play with Parser:

$ ruby-parse -L -e "2+2"
(send
  (int 2) :+
  (int 2))
2+2
 ~ selector
~~~ expression
(int 2)
2+2
~ expression
(int 2)
2+2

$ ruby-parse -E -e "2+2"
2+2
^ tINTEGER 2                                    expr_end     [0 <= cond] [0 <= cmdarg]
2+2
 ^ tPLUS "+"                                    expr_beg     [0 <= cond] [0 <= cmdarg]
2+2
  ^ tINTEGER 2                                  expr_end     [0 <= cond] [0 <= cmdarg]
2+2
  ^ false "$eof"                                expr_end     [0 <= cond] [0 <= cmdarg]
(send
  (int 2) :+
  (int 2))

Features

  • Precise source location reporting.
  • Documented AST format which is convenient to work with.
  • A simple interface and a powerful, tweakable one.
  • Parses 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1, and 3.2 syntax with backwards-compatible AST formats.
  • Parses MacRuby and RubyMotion syntax extensions.
  • Rewriting support.
  • Parsing error recovery.
  • Improved clang-like diagnostic messages with location information.
  • Written in pure Ruby, runs on MRI >=2.0.0, JRuby and Rubinius (and historically, all versions of Ruby since 1.8)
  • Only one runtime dependency: the ast gem.
  • Insane Ruby lexer rewritten from scratch in Ragel.
  • 100% test coverage for Bison grammars (except error recovery).
  • Readable, commented source code.

Documentation

Documentation for Parser is available online.

Node names

Several Parser nodes seem to be confusing enough to warrant a dedicated README section.

(block)

The (block) node passes a Ruby block, that is, a closure, to a method call represented by its first child, a (send), (super) or (zsuper) node. To demonstrate:

$ ruby-parse -e 'foo { |x| x + 2 }'
(block
  (send nil :foo)
  (args
    (arg :x))
  (send
    (lvar :x) :+
    (int 2)))

(begin) and (kwbegin)

TL;DR: Unless you perform rewriting, treat (begin) and (kwbegin) as the same node type.

Both (begin) and (kwbegin) nodes represent compound statements, that is, several expressions which are executed sequentally and the value of the last one is the value of entire compound statement. They may take several forms in the source code:

  • foo; bar: without delimiters
  • (foo; bar): parenthesized
  • begin foo; bar; end: grouped with begin keyword
  • def x; foo; bar; end: grouped inside a method definition

and so on.

$ ruby-parse -e '(foo; bar)'
(begin
  (send nil :foo)
  (send nil :bar))
$ ruby-parse -e 'def x; foo; bar end'
(def :x
  (args)
  (begin
    (send nil :foo)
    (send nil :bar)))

Note that, despite its name, kwbegin node only has tangential relation to the begin keyword. Normally, Parser AST is semantic, that is, if two constructs look differently but behave identically, they get parsed to the same node. However, there exists a peculiar construct called post-loop in Ruby:

begin
  body
end while condition

This specific syntactic construct, that is, keyword begin..end block followed by a postfix while, behaves very unlike other similar constructs, e.g. (body) while condition. While the body itself is wrapped into a while-post node, Parser also supports rewriting, and in that context it is important to not accidentally convert one kind of loop into another.

$ ruby-parse -e 'begin foo end while cond'
(while-post
  (send nil :cond)
  (kwbegin
    (send nil :foo)))
$ ruby-parse -e 'foo while cond'
(while
  (send nil :cond)
  (send nil :foo))
$ ruby-parse -e '(foo) while cond'
(while
  (send nil :cond)
  (begin
    (send nil :foo)))

(Parser also needs the (kwbegin) node type internally, and it is highly problematic to map it back to (begin).)

Backwards compatibility

Parser does not use semantic versioning. Parser versions are structured as x.y.z.t, where x.y.z indicates the most recent supported Ruby release (support for every Ruby release that is chronologically earlier is implied), and t is a monotonically increasing number.

The public API of Parser as well as the AST format (as listed in the documentation) are considered stable forever, although support for old Ruby versions may be removed at some point.

Sometimes it is necessary to modify the format of AST nodes that are already being emitted in a way that would break existing applications. To avoid such breakage, applications must opt-in to these modifications; without explicit opt-in, Parser will continue to emit the old AST node format. The most recent set of opt-ins is specified in the usage section of this README.

Compatibility with Ruby MRI

Unfortunately, Ruby MRI often changes syntax in patchlevel versions. This has happened, at least, for every release since 1.9; for example, commits c5013452 and 04bb9d6b were backported all the way from HEAD to 1.9. Moreover, there is no simple way to track these changes.

This policy makes it all but impossible to make Parser precisely compatible with the Ruby MRI parser. Indeed, at September 2014, it would be necessary to maintain and update ten different parsers together with their lexer quirks in order to be able to emulate any given released Ruby MRI version.

As a result, Parser chooses a different path: the parser/rubyXY parsers recognize the syntax of the latest minor version of Ruby MRI X.Y at the time of the gem release.

Compatibility with MacRuby and RubyMotion

Parser implements the MacRuby 0.12 and RubyMotion mid-2015 parsers precisely. However, the lexers of these have been forked off Ruby MRI and independently maintained for some time, and because of that, Parser may accept some code that these upstream implementations are unable to parse.

Known issues

Adding support for the following Ruby MRI features in Parser would needlessly complicate it, and as they all are very specific and rarely occurring corner cases, this is not done.

Parser has been extensively tested; in particular, it parses almost entire Rubygems corpus. For every issue, a breakdown of affected gems is offered.

Void value expressions

Ruby MRI prohibits so-called "void value expressions". For a description of what a void value expression is, see this gist and this Parser issue.

It is unknown whether any gems are affected by this issue.

Invalid characters inside comments and literals

Ruby MRI permits arbitrary non-7-bit byte sequences to appear in comments, as well as in string or symbol literals in form of escape sequences, regardless of source encoding. Parser requires all source code, including the expanded escape sequences, to consist of valid byte sequences in the source encoding that are convertible to UTF-8.

As of 2013-07-25, there are about 180 affected gems.

\u escape in 1.8 mode

Ruby MRI 1.8 permits to specify a bare \u escape sequence in a string; it treats it like u. Ruby MRI 1.9 and later treat \u as a prefix for Unicode escape sequence and do not allow it to appear bare. Parser follows 1.9+ behavior.

As of 2013-07-25, affected gems are: activerdf, activerdf_net7, fastreader, gkellog-reddy.

Dollar-dash

(This one is so obscure I couldn't even think of a saner name for this issue.) Pre-2.1 Ruby allows to specify a global variable named $-. Ruby 2.1 and later treat it as a syntax error. Parser follows 2.1 behavior.

No known code is affected by this issue.

EOF characters after embedded documents before 2.7

Code like "=begin\n""=end\0" is invalid for all versions of Ruby before 2.7. Ruby 2.7 and later parses it normally. Parser follows 2.7 behavior.

It is unknown whether any gems are affected by this issue.

Contributors

Acknowledgements

The lexer testsuite is derived from ruby_parser.

The Bison parser rules are derived from Ruby MRI parse.y.

Contributing

  1. Make sure you have Ragel ~> 6.7 installed
  2. Fork it
  3. Create your feature branch (git checkout -b my-new-feature)
  4. Commit your changes (git commit -am 'Add some feature')
  5. Push to the branch (git push origin my-new-feature)
  6. Create new Pull Request

More Repositories

1

unfork

unfork(2) is the inverse of fork(2). sort of.
C++
1,460
star
2

ipaddr.js

IP address manipulation library in JavaScript
JavaScript
528
star
3

rack-utf8_sanitizer

Rack::UTF8Sanitizer is a Rack middleware which cleans up invalid UTF8 characters in request URI and headers.
Ruby
298
star
4

irclogger

Simple and good-looking IRC log viewer. Logger is included. No strings are attached.
Ruby
252
star
5

superlinker

a tool for reinterpreting ELF executables and shared libraries
Rust
250
star
6

ast

A library for working with Abstract Syntax Trees.
Ruby
194
star
7

rust-xdg

A library that makes it easy to follow the X Desktop Group specifications
Rust
155
star
8

kicad-boardview

KiCAD to Boardview exporter reads KiCAD PCB layout files and writes ASCII Boardview files
Python
152
star
9

rust-vnc

An implementation of VNC protocol, client state machine, a client and a proxy
Rust
130
star
10

Boneless-CPU

Resource-efficient 16-bit CPU architecture for FPGA control plane
Python
92
star
11

zmtp-wireshark

A Wireshark dissector for ZMTP version 3.0 and later (ZeroMQ 4 and later)
Lua
82
star
12

furnace-avm2

Flash ActionScript3 VM static analysis library based on Furnace framework.
Ruby
76
star
13

libfx2

Chip support package for Cypress EZ-USB FX2 series microcontrollers
C
73
star
14

binja_itanium_cxx_abi

Binary Ninja Itanium C++ ABI Plugin
Python
63
star
15

coldruby

ColdRuby is a compiler of Ruby 1.9 MRI bytecode, and a runtime written in JavaScript to aid in execution of Ruby code. It also includes a C++ executable using very fast V8 scripting engine and native extensions for regular expressions, fibers and more.
JavaScript
62
star
16

Yumewatari

ε¦–εˆ€ε€’ζΈ‘
Python
56
star
17

ocaml-m17n

Multilingualization for the OCaml source code
OCaml
52
star
18

rust-log_buffer

A zero-allocation ring buffer for storing text logs, implemented in Rust
Rust
50
star
19

prjbureau

Documenting the Microchip (Atmel) ATF15xx CPLD fuse maps and programming algorithms
Python
49
star
20

rust-facedetect

A primer on using OpenCV with Rust
Rust
43
star
21

groupXIV

Microphotography viewer based on Leaflet.js
JavaScript
41
star
22

ocaml-inotify

OCaml bindings for inotify.
OCaml
39
star
23

python-itanium_demangler

Pure Python Itanium C++ ABI demangler
Python
34
star
24

ocaml-llvm-ng

A practical LLVM backend for OCaml (will never be finished)
OCaml
33
star
25

infra-vpn

Automation for WireGuard VPN tunnels
32
star
26

SIPCaller

A simple Android app to call SIP numbers directly
Java
31
star
27

rlua

Ruby to Lua bindings library.
C
27
star
28

js_of_ocaml-example

A tiny sample js_of_ocaml project
OCaml
26
star
29

ocaml-lz4

OCaml bindings for LZ4, a very fast lossless compression algorithm
OCaml
26
star
30

binja-avnera

Binary Ninja plugin for the Avnera AV6xxx/AV7xxx architecture
Python
25
star
31

icefloorplan

iCE40 floorplan viewer
C++
24
star
32

tf2_healslut

C++
23
star
33

sublime-better-ocaml

Default Sublime Text highlighting for OCaml sucks. I've fixed it.
23
star
34

gameboy-grabber

Rust
20
star
35

furnace

A static analysis framework.
Ruby
18
star
36

sublime-ocp-index

Sublime Text plugin which provides OCaml autocompletion with ocp-index
Python
17
star
37

ATF15xx-EVB

Cheap & simple evaluation boards for Microchip ATF15xx CPLDs
16
star
38

zs

ZYTOKINE STORM is a user-mode Linux binary translation layer targeting Darwin
C++
16
star
39

binja-m16c

Binary Ninja plugin for the Renesas M16C architecture
Python
15
star
40

opam-query

A tool to query opam files from shell scripts
OCaml
15
star
41

bfcpu2

A pipelined brainfuck softcore in Verilog
Verilog
14
star
42

binja-i8086

16-bit x86 architecture for Binary Ninja
Python
13
star
43

cylinder

OCaml
13
star
44

sublime-imethod-fix

β†’ Makes XCompose work in Sublime Text! ←
C
13
star
45

ocaml-protobuf

Google Protocol Buffers runtime implemented in OCaml
OCaml
12
star
46

lab-notebook

Source for my lab notebook
JavaScript
12
star
47

story-os

A microkernel OS for x86 I wrote in C++ back in 2007. Features VMM, TSS multitasking, and oddly shaped C++. Updated in 2023 to fix some memory management bugs and now it works.
C
12
star
48

thunderscope-rs

Rust
11
star
49

binja_extended_api

Extended Python API for Binary Ninja
Python
10
star
50

sparkle

Sparkle is a zero-configuration decentralized VPN which can tunnel IP packets, simultaneously handle multiple payload types like VoIP or IM (together with tunneling) and be embedded anywhere as a small library.
C
10
star
51

ocaml-expat

The official repository of the ocaml-expat library
OCaml
9
star
52

murmurhash3-js

JavaScript implementation of MurmurHash3
JavaScript
9
star
53

lend

Allocator
C
9
star
54

libstm32

A reimplementation of standard library for STM32 family Cortex-M3 processors.
C
9
star
55

libnrf24l

Chip support package for Nordic nRF24L series microcontrollers
C++
8
star
56

unrandom

Make srand() always use the seed 0 using LD_PRELOAD
C
8
star
57

catircservices.org

Nix configuration for a Matrix<>IRC and Matrix<>Discord bridge
Nix
7
star
58

Sublime-Yesterday-Theme

whitequark's fork of Tomorrow color scheme
7
star
59

eliom-example

Eliom example application
OCaml
7
star
60

infra-server

Ansible configuration for https://whitequark.org
Shell
7
star
61

pry.ml

OCaml
7
star
62

usbasp

Thomas Fischl's USBasp (orig. http://www.fischl.de/usbasp/) with my patches.
C
6
star
63

binja_function_abi

Binary Ninja plugin for viewing and changing function ABIs in a fine-grained way via the GUI
Python
6
star
64

LineageOS_vendor_whitequark

My LineageOS overlay
Shell
6
star
65

furnace-swf

A rudimentary SWF reader for furnace-avm2.
Ruby
6
star
66

rust-touptek

Rust bindings for Touptek ToupLite image acquisition library
Rust
6
star
67

LLVM-TableGen.tmBundle

LLVM TableGen syntax definition
5
star
68

sdcc

Git mirror of http://svn.code.sf.net/p/sdcc/code/trunk
C
5
star
69

usbasploader

Objective Development's USBaspLoader (orig. http://www.obdev.at/products/vusb/usbasploader.html) with my patches.
C
5
star
70

track-pypi-dependency-version

A script for use with GitHub Actions that updates the upper bound in requirements.txt when a package is released on PyPI
Python
5
star
71

ruby-cross-reference

Sources for the Ruby Cross Reference LXR setup
Perl
5
star
72

ocaml-cavalry

Marshaling experiment
OCaml
5
star
73

vuxboot

VuXboot is a small but powerful AVR UART bootloader.
C++
4
star
74

Sublime-JESD3

JEDEC JESD3 syntax highlighter for Sublime Text 3
4
star
75

samplerate-rs

Rust bindings for libsamplerate
Rust
4
star
76

i3gamma

i3gamma integrates with the i3 window manager and changes the gamma correction value depending on the focused window
Rust
4
star
77

disable_eval

The only safe eval is no eval.
Ruby
3
star
78

suwabara

Ruby
3
star
79

unbot

Ruby
3
star
80

binja_xapi_bookmarks

Binary Ninja bookmarks plugin that integrates into the GUI
Python
3
star
81

sphinxcontrib-platformpicker

Platform picker extension for Sphinx
Python
3
star
82

cyberplat_pki

CyberplatPKI is a library for signing Cyberplat requests.
Ruby
3
star
83

pythonparser

Python
3
star
84

whimper

OCaml
2
star
85

Leaflet.Nanoscale

Sub-millimeter scale indicator for Leaflet.js
JavaScript
2
star
86

blog

http://whitequark.org
Ruby
2
star
87

Sublime-S-Expressions

Sublime Text 3 syntax definition for S-Expressions
2
star
88

bacon-colored_output

Colored output for Bacon testing framework!
Ruby
2
star
89

linux5be

Remains of a failed attempt to port Linux to BE-300. WARNING: Doing so may result in severe frustration
2
star
90

catirclogs.org

(WIP) Nix configuration for IRC logging infrastructure
Nix
1
star
91

pcbhdl

Python toolbox for designing printed circuit boards
Python
1
star
92

50w-modular-psu

A modular high-voltage PSU, rated at 50W continuous power
Processing
1
star
93

ocaml-eval_in

An http://eval.in/ bot in OCaml.
OCaml
1
star
94

vacuum-induction-furnace

CAD files for a vacuum induction furnace with capacity of 50mL
HTML
1
star
95

Shinobu

忍
1
star
96

n250-dsdt

Samsung N250 ACPI DSDT
1
star
97

cmake-ocaml-simple

Simple, inflexible and non-expressive OCaml rules for CMake. Originally developed for LLVM CMake builds.
CMake
1
star