• Stars
    star
    2,681
  • Rank 16,315 (Top 0.4 %)
  • Language
    Python
  • Created over 13 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python Lex-Yacc

PLY - Python Lex-Yacc

Author: David Beazley (https://www.dabeaz.com)

Introduction

PLY is a zero-dependency Python implementation of the traditional parsing tools lex and yacc. It uses the same LALR(1) parsing algorithm as yacc and has most of its core features. It is compatible with all modern versions of Python.

PLY was originally created in 2001 to support an Introduction to Compilers course at the University of Chicago. As such, it has almost no features other than the core LALR(1) parsing algorithm. This is by design--students should be made to suffer. Well, at least a little bit. However, from a more practical point of view, there is a lot flexibility in terms of how you decide to use it. You can use PLY to build Abstract Syntax Trees (ASTs), simple one-pass compilers, protocol decoders, or even a more advanced parsing framework.

Important Notice - October 27, 2022

The PLY project will make no further package-installable releases. If you want the latest version, you'll need to download it here or clone the repo.

Requirements

The current release of PLY requires the use of Python 3.6 or greater. If you need to support an older version, download one of the historical releases at https://github.com/dabeaz/archive/tree/main/ply.

How to Install and Use

Although PLY is open-source, it is not distributed or installed by package manager. There are only two files: lex.py and yacc.py, both of which are contained in a ply package directory. To use PLY, copy the ply directory into your project and import lex and yacc from the associated ply subpackage. Alternatively, you can install these files into your working python using make install.

from .ply import lex
from .ply import yacc

PLY has no third-party dependencies and can be freely renamed or moved around within your project as you see fit. It rarely changes.

Example

The primary use for PLY is writing parsers for programming languages. Here an example that parses simple expressions into an abstract syntax tree (AST):

# -----------------------------------------------------------------------------
# example.py
#
# Example of using PLY To parse the following simple grammar.
#
#   expression : term PLUS term
#              | term MINUS term
#              | term
#
#   term       : factor TIMES factor
#              | factor DIVIDE factor
#              | factor
#
#   factor     : NUMBER
#              | NAME
#              | PLUS factor
#              | MINUS factor
#              | LPAREN expression RPAREN
#
# -----------------------------------------------------------------------------

from ply.lex import lex
from ply.yacc import yacc

# --- Tokenizer

# All tokens must be named in advance.
tokens = ( 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN',
           'NAME', 'NUMBER' )

# Ignored characters
t_ignore = ' \t'

# Token matching rules are written as regexs
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'

# A function can be used if there is an associated action.
# Write the matching regex in the docstring.
def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

# Ignored token with an action associated with it
def t_ignore_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count('\n')

# Error handler for illegal characters
def t_error(t):
    print(f'Illegal character {t.value[0]!r}')
    t.lexer.skip(1)

# Build the lexer object
lexer = lex()
    
# --- Parser

# Write functions for each grammar rule which is
# specified in the docstring.
def p_expression(p):
    '''
    expression : term PLUS term
               | term MINUS term
    '''
    # p is a sequence that represents rule contents.
    #
    # expression : term PLUS term
    #   p[0]     : p[1] p[2] p[3]
    # 
    p[0] = ('binop', p[2], p[1], p[3])

def p_expression_term(p):
    '''
    expression : term
    '''
    p[0] = p[1]

def p_term(p):
    '''
    term : factor TIMES factor
         | factor DIVIDE factor
    '''
    p[0] = ('binop', p[2], p[1], p[3])

def p_term_factor(p):
    '''
    term : factor
    '''
    p[0] = p[1]

def p_factor_number(p):
    '''
    factor : NUMBER
    '''
    p[0] = ('number', p[1])

def p_factor_name(p):
    '''
    factor : NAME
    '''
    p[0] = ('name', p[1])

def p_factor_unary(p):
    '''
    factor : PLUS factor
           | MINUS factor
    '''
    p[0] = ('unary', p[1], p[2])

def p_factor_grouped(p):
    '''
    factor : LPAREN expression RPAREN
    '''
    p[0] = ('grouped', p[2])

def p_error(p):
    print(f'Syntax error at {p.value!r}')

# Build the parser
parser = yacc()

# Parse an expression
ast = parser.parse('2 * 3 + 4 * (5 - x)')
print(ast)

Documentation

The doc/ directory has more extensive documentation on PLY.

Examples

The example/ directory has various examples of using PLY.

Test Suite

PLY is not shipped with a test-suite. However, it is extensively tested before releases are made. The file ply-tests.tar.gz contains a current version of the tests should you want to run them on your own.

Bug Reports

PLY is mature software and new features are rarely added. If you think you have found a bug, please report an issue.

Questions and Answers

Q: Why does PLY have such a weird API?

Aside from simply being over 20 years old and predating many advanced Python features, PLY takes inspiration from two primary sources: the Unix yacc utility and John Aycock's Spark toolkit. yacc uses a convention of referencing grammar symbols by $1, $2, and so forth. PLY mirrors this using p[1], p[2], etc. Spark was a parsing toolkit that utilized introspection and Python docstrings for specifying a grammar. PLY borrowed that because it was clever. A modern API can be found in the related SLY project.

Q: Why isn't PLY distributed as a package?

PLY is highly specialized software that is really only of interest to people creating parsers for programming languages. In such a project, I think you should take responsibility for library dependencies. PLY is very small and rarely changes--if it works for your project, there is no reason to ever update it. At the same time, as it's author, I reserve the creative right to make occasional updates to the project, including those that might introduce breaking changes. The bottom line is that I'm not a link in your software supply chain. If you use PLY, you should copy it into your project.

Q: Do you accept pull requests and user contributions?

No. New features are not being added to PLY at this time. However, I am interested in bug reports should you find a problem with PLY. Feel free to use the issue-tracker for feature ideas--just because PLY is not in active development doesn't mean that good ideas will be ignored.

Q: Can I use PLY to make my own parsing tool?

Absolutely! It's free software that you are free to copy and modify in any way that makes sense. This includes remixing it into something that's more powerful. I'd just ask that you keep the original copyright notice in files based on the original PLY source code so that others know where it came from.

Acknowledgements

A special thanks is in order for all of the students in CS326 who suffered through about 25 different versions of this.

The CHANGES file acknowledges those who have contributed patches.

Elias Ioup did the first implementation of LALR(1) parsing in PLY-1.x. Andrew Waters and Markus Schoepflin were instrumental in reporting bugs and testing a revised LALR(1) implementation for PLY-2.0.

Take a Class!

If you'd like to learn more about compiler principles or have a go at implementing a compiler from scratch, come take a course. https://www.dabeaz.com/compiler.html.

More Repositories

1

curio

Good Curio!
Python
3,990
star
2

python-cookbook

Code samples from the "Python Cookbook, 3rd Edition", published by O'Reilly & Associates, May, 2013.
Python
3,837
star
3

sly

Sly Lex Yacc
Python
793
star
4

dataklasses

A different spin on dataclasses.
Python
783
star
5

bitey

Python
607
star
6

generators

Generator Tricks for Systems Programmers (Tutorial)
412
star
7

cluegen

Get a clue, get some code
Python
355
star
8

thredo

Thredo was an experiment - It's dead. Feel free to look around.
Python
338
star
9

blog

David Beazley's blog.
254
star
10

concurrencylive

Code from Concurrency Live - PyCon 2015
Python
154
star
11

python-distilled

Resources for Python Distilled (Book)
81
star
12

modulepackage

Materials for PyCon2015 Tutorial "Modules and Packages : Live and Let Die"
Python
73
star
13

wadze

Web Assembly Decoder - Zero Extras
Python
70
star
14

me-al

MeαΊ—al - The Decorator
Python
63
star
15

pythonprog

Python
61
star
16

typemap

Typemap - The Annotator (TM)
Python
55
star
17

pylox

Python implementation of the Lox language from Robert Nystrom's Crafting Interpreters
Python
46
star
18

flail

Ball and Chain Decorators
Python
41
star
19

asm6502

A small 6502 assembler written in Python
Python
20
star
20

ranet

An Implementation of Raft in Janet
18
star
21

raft_sep_19

Python
11
star
22

hoppy

9
star
23

raft_jun_2019

Raft Course 2019
Python
9
star
24

colonel

C Curio Kernel
Python
8
star
25

raft_dec19

Raft Project December 2019
Python
8
star
26

theater

Code for "The Problem with the Problem" talk, February 22, 2022.
Python
7
star
27

dabeaz.github.io

HTML
4
star
28

frothy

Frothy example from CSCI1730 Fall 2023
Racket
3
star
29

archive

Software Archive
3
star