• Stars
    star
    3,226
  • Rank 13,919 (Top 0.3 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🐍 Complete C99 parser in pure Python

pycparser v2.21


1   Introduction

1.1   What is pycparser?

pycparser is a parser for the C language, written in pure Python. It is a module designed to be easily integrated into applications that need to parse C source code.

1.2   What is it good for?

Anything that needs C code to be parsed. The following are some uses for pycparser, taken from real user reports:

  • C code obfuscator
  • Front-end for various specialized C compilers
  • Static code checker
  • Automatic unit-test discovery
  • Adding specialized extensions to the C language

One of the most popular uses of pycparser is in the cffi library, which uses it to parse the declarations of C functions and types in order to auto-generate FFIs.

pycparser is unique in the sense that it's written in pure Python - a very high level language that's easy to experiment with and tweak. To people familiar with Lex and Yacc, pycparser's code will be simple to understand. It also has no external dependencies (except for a Python interpreter), making it very simple to install and deploy.

1.3   Which version of C does pycparser support?

pycparser aims to support the full C99 language (according to the standard ISO/IEC 9899). Some features from C11 are also supported, and patches to support more are welcome.

pycparser supports very few GCC extensions, but it's fairly easy to set things up so that it parses code with a lot of GCC-isms successfully. See the FAQ for more details.

1.4   What grammar does pycparser follow?

pycparser very closely follows the C grammar provided in Annex A of the C99 standard (ISO/IEC 9899).

1.5   How is pycparser licensed?

BSD license.

1.6   Contact details

For reporting problems with pycparser or submitting feature requests, please open an issue, or submit a pull request.

2   Installing

2.1   Prerequisites

  • pycparser was tested with Python 3.7+ on Linux, Mac OS and Windows.
  • pycparser has no external dependencies. The only non-stdlib library it uses is PLY, which is bundled in pycparser/ply. The current PLY version is 3.10, retrieved from http://www.dabeaz.com/ply/

Note that pycparser (and PLY) uses docstrings for grammar specifications. Python installations that strip docstrings (such as when using the Python -OO option) will fail to instantiate and use pycparser. You can try to work around this problem by making sure the PLY parsing tables are pre-generated in normal mode; this isn't an officially supported/tested mode of operation, though.

2.2   Installation process

The recommended way to install pycparser is with pip:

> pip install pycparser

3   Using

3.1   Interaction with the C preprocessor

In order to be compilable, C code must be preprocessed by the C preprocessor - cpp. cpp handles preprocessing directives like #include and #define, removes comments, and performs other minor tasks that prepare the C code for compilation.

For all but the most trivial snippets of C code pycparser, like a C compiler, must receive preprocessed C code in order to function correctly. If you import the top-level parse_file function from the pycparser package, it will interact with cpp for you, as long as it's in your PATH, or you provide a path to it.

Note also that you can use gcc -E or clang -E instead of cpp. See the using_gcc_E_libc.py example for more details. Windows users can download and install a binary build of Clang for Windows from this website.

3.2   What about the standard C library headers?

C code almost always #includes various header files from the standard C library, like stdio.h. While (with some effort) pycparser can be made to parse the standard headers from any C compiler, it's much simpler to use the provided "fake" standard includes for C11 in utils/fake_libc_include. These are standard C header files that contain only the bare necessities to allow valid parsing of the files that use them. As a bonus, since they're minimal, it can significantly improve the performance of parsing large C files.

The key point to understand here is that pycparser doesn't really care about the semantics of types. It only needs to know whether some token encountered in the source is a previously defined type. This is essential in order to be able to parse C correctly.

See this blog post for more details.

Note that the fake headers are not included in the pip package nor installed via setup.py (#224).

3.3   Basic usage

Take a look at the examples directory of the distribution for a few examples of using pycparser. These should be enough to get you started. Please note that most realistic C code samples would require running the C preprocessor before passing the code to pycparser; see the previous sections for more details.

3.4   Advanced usage

The public interface of pycparser is well documented with comments in pycparser/c_parser.py. For a detailed overview of the various AST nodes created by the parser, see pycparser/_c_ast.cfg.

There's also a FAQ available here. In any case, you can always drop me an email for help.

4   Modifying

There are a few points to keep in mind when modifying pycparser:

  • The code for pycparser's AST nodes is automatically generated from a configuration file - _c_ast.cfg, by _ast_gen.py. If you modify the AST configuration, make sure to re-generate the code. This can be done by running the _build_tables.py script from the pycparser directory.
  • Make sure you understand the optimized mode of pycparser - for that you must read the docstring in the constructor of the CParser class. For development you should create the parser without optimizations, so that it will regenerate the Yacc and Lex tables when you change the grammar.

5   Package contents

Once you unzip the pycparser package, you'll see the following files and directories:

README.rst:
This README file.
LICENSE:
The pycparser license
setup.py:
Installation script
examples/:
A directory with some examples of using pycparser
pycparser/:
The pycparser module source code.
tests/:
Unit tests.
utils/fake_libc_include:
Minimal standard C library include files that should allow to parse any C code. Note that these headers now include C11 code, so they may not work when the preprocessor is configured to an earlier C standard (like -std=c99).
utils/internal/:
Internal utilities for my own use. You probably don't need them.

6   Contributors

Some people have contributed to pycparser by opening issues on bugs they've found and/or submitting patches. The list of contributors is in the CONTRIBUTORS file in the source distribution. After pycparser moved to Github I stopped updating this list because Github does a much better job at tracking contributions.

More Repositories

1

pyelftools

Parsing ELF and DWARF in Python
Python
2,013
star
2

code-for-blog

Code samples from my blog
Go
1,574
star
3

llvm-clang-samples

UNMAINTAINED: Examples of using the LLVM and Clang compilation libraries and tools
C++
1,187
star
4

raft

🚣 Raft implementation in Go
Go
1,045
star
5

pss

pss is a power-tool for searching inside source code files.
Python
327
star
6

pykaleidoscope

Implementation of the LLVM tutorial in Python
Python
292
star
7

static-server

A simple, zero-configuration HTTP server CLI for serving static files
Go
183
star
8

modlib

Go project layout with exported packages and command-line tools, using modules
Go
180
star
9

deep-learning-samples

Sample code for deep learning & neural networks
Python
170
star
10

bobscheme

An implementation of Scheme in Python and C++
C++
167
star
11

luz-cpu

Educational open-source CPU suite (with assembler, linker and simulator)
Python
161
star
12

js-8080-sim

Intel 8080 assembler and simulator in JS, optimized for simplicity.
JavaScript
155
star
13

wasm-wat-samples

Samples of WebAssembly Text programs
WebAssembly
128
star
14

gemini-cli

Access Gemini LLMs from the command-line
Go
100
star
15

libjit-samples

Code samples for using libjit
C
91
star
16

go-sudoku

Toolkit for solving and generating Sudoku puzzles in Go
Go
88
star
17

wcx64

Clone of wc in x64 assembly
Assembly
87
star
18

c-unleashed-book-souce-code

Source code for the "C Unleashed" book by Richard Heathfield, Lawrence Kirby, et al.
C
70
star
19

gosax

gosax is a basic wrapper for stream parsing of XML (SAX) Go
Go
62
star
20

asdl_parser

Standalone ASDL parser for upstream CPython 3.x
Python
61
star
21

go-ungrammar

Ungrammar implementation and API in Go
Go
43
star
22

paip-in-clojure

🌅 Code from "Paradigms of Artificial Intelligence Programming" in Clojure
Clojure
39
star
23

cs344

Introduction to Parallel Programming class code
C++
31
star
24

go-sentencepiece

Go implementation of the SentencePiece tokenizer
Go
20
star
25

go-websocket-sample

Sample Go websocket server and JS client with tracing and JSON data interchange
Go
20
star
26

esms

Legacy - ESMS (Electronic Soccer Management Simulator)
C++
18
star
27

xmlgen

C
15
star
28

go-quines

Some quines in the Go programming language
Go
15
star
29

python3-samples

Samples of Python 3 code, libraries, etc.
Python
12
star
30

gogl

Go Generic Library
Go
11
star
31

tupperformula

JS demo of Tupper's formula
JavaScript
9
star
32

arm_asm_hacks

ARM assembly hacks
Assembly
8
star
33

lsystem

JavaScript
8
star
34

hackutils

Various hacky utils for Go
Go
8
star
35

gocloud-samples

Samples/hacks with gocloud
Go
7
star
36

jscool2asm

JavaScript
5
star
37

sample-go-cli

Sample Go CLI project structure
Go
4
star
38

colorful-lines

My JS clone of the Colorful Lines game
JavaScript
4
star
39

cl-in-clj

Common Lisp --> Clojure snippets/idioms translation
Clojure
3
star
40

gocdkx

Fork of google/go-cloud for experiments.
Go
3
star
41

so-tag-sentiment-analysis

Go
2
star
42

ollama-import-experiment

Go
2
star
43

line-plotting

JavaScript
2
star
44

go-travis-test

sandbox
Shell
2
star
45

cljmin

Minimal project template for Clojure
Clojure
2
star
46

diffuse-sim

JavaScript
1
star
47

sample-go-project-vscode

Go
1
star
48

onlisp-in-clojure

Clojure
1
star
49

4clojure-solutions

My solutions to https://www.4clojure.com/problems
Clojure
1
star