• Stars
    star
    606
  • Rank 73,958 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 12 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bitey - Bitcode Import Tool

Import LLVM bitcode directly into Python and use it as an extension module.

Warning

THIS IS PROOF-OF-CONCEPT SOFTWARE THAT DESPITE ITS CUTE NAME, MIGHT ACTUALLY BITE YOUR ENTIRE LEG OFF. USE AT YOUR OWN RISK!

Requirements

You'll need to have a pretty complete LLVM development environment installed on your machine. Bitey has been developed using LLVM/Clang-3.1. You might need to install it yourself.

In addition, you need to install the llvm-py extension. Get it at at http://www.llvmpy.org.

Bitey is unlikely to work with any older version of LLVM or the llvm-py extension--especially preinstalled versions distributed with your operating system. You need to be using bleeding-edge modern versions of these libraries.

Example and Basic Tutorial

First, you need some C code. Something important like computing a fibonacci number:

/* fib.c */
int fib(int n) {
    if (n < 3) {
       return 1;
    } else {
       return fib(n-1) + fib(n-2);
    }
}

Now, compile it into LLVM bitcode using clang:

bash % clang -emit-llvm -c fib.c

This makes an object file fib.o as usual--only the .o file contains LLVM bitcode. Now, just import it into Python:

>>> import bitey
>>> import fib
>>> fib.fib(38)
39088169
>>>

Yes, that's it. Bitey does not use the C compiler, the linker, or the dynamic loader. You don't write wrapper functions either. Write normal C, compile it with clang, and import it. Done.

Bitey understands most basic C datatypes including integers, floats, void, pointers, arrays, and structures. Because it builds a ctypes based interface, you would access the code using the same techniques. Here is an example that mutates a value through a pointer:

/* mutate.c */

void mutate_int(int *x) {
     *x *= 2;
}

Here's how you would use this:

% clang -emit-llvm -c mutate.c
% python
>>> import bitey
>>> import mutate
>>> import ctypes
>>> x = ctypes.c_int(2)
>>> mutate.mutate_int(x)
>>> x.value
4
>>>

Here is an example involving a structure:

/* point.c */
#include <math.h>

struct Point {
    double x;
    double y;
};

double distance(struct Point *p1, struct Point *p2) {
    return sqrt((p1->x - p2->x)*(p1->x - p2->x) +
                (p1->y - p2->y)*(p1->y - p2->y));
}

To run:

% clang -emit-llvm -c point.c
% python
>>> import bitey
>>> import point
>>> p1 = point.Point(3,4)
>>> p2 = point.Point(6,8)
>>> point.distance(p1,p2)
5.0
>>>

One subtle issue with structure wrapping is that LLVM bitcode doesn't encode the names of structure fields. So, Bitey simply assigns them to an indexed element variable like this:

>>> p1.e0         # (Returns the .x component)
3
>>> p1.e1         # (Returns the .y component)
4

This can be fixed using a pre-load module as described in the "Advanced Topics" section later.

If you need to combine two LLVM object files together into a single importable module, use llvm-ld like this:

% llvm-ld point.o fib.o -b combined.o
% python
>>> import bitey
>>> import combined
>>> combined.fib(10)
55
>>> p1 = combined.Point(3,4)
>>> p2 = combined.Point(6,8)
>>> combined.distance(p1,p2)
5.0
>>>

The C code you write can link with external libraries, but you might need to take special steps to load the library prior to import. For example, suppose you compiled the Fibonacci code into a shared library like this:

# OS-X
% gcc -bundle -export_dynamic fib.c -o libfib.so

# Linux
% gcc -shared fib.c -o libfib.so

Now, suppose you had some C code that wanted to access this library:

/* sample.c */
#include <stdio.h>
extern int fib(int n);

void print_fib(int n) {
    while (n > 0) {
        printf("%d\n", fib(n));
        n--;
    }
}

If you try to build it normally, you'll get an error:

% clang -emit-llvm -c sample.c
% python
>>> import bitey
>>> import sample
LLVM ERROR: Program used external function 'fib' which could not be resolved!
%

However, you can load the library yourself doing this:

 % python
 >>> import bitey
 >>> bitey.load_library("./libfib.so")
<CDLL './libfib.so', handle 1003cfc60 at 10049d090>
 >>> import sample
 >>> sample.print_fib(10)
 55
 34
 21
 13
 8
 5
 3
 2
 1
 1
 >>>

It is important to note that Bitey is NOT a wrapper generator meant to access already-compiled C libraries. It only exposes functionality that has been explicitly compiled as LLVM bitcode. To access the contents of a library, you would need to compile and link it using clang and llvm-ld as shown in the examples.

How it works

Bitey extends Python with an import hook that looks for .o files containing LLVM bitcode. Type signatures and other information in the bitcode are then used to build a ctypes-based binding to the natively compiled functions contained within an LLVM execution engine. It's all a bit magical, but the LLVM JIT generates the executable code whereas Bitey makes the ctypes binding to it---all behind the scenes on import.

It's important to stress that Bitey does not use the C compiler, the linker, the dynamic loader, or make calls to subprocesses. It is completely self-contained and only uses the functionality of llvm-py and ctypes.

Performance

The performance profile of Bitey is going to be virtually identical that of using ctypes. LLVM bitcode is translated to native machine code and Bitey builds a ctypes-based interface to it in exactly the same manner as a normal C library.

As a performance experiment, here is a simple C function that checks if a number is prime or not:

int isprime(int n) {
    int factor = 3;
    /* Special case for 2 */
    if (n == 2) {
        return 1;
    }
    /* Check for even numbers */
    if ((n % 2) == 0) {
       return 0;
    }
    /* Check for everything else */
    while (factor*factor < n) {
        if ((n % factor) == 0) {
            return 0;
        }
        factor += 2;
    }
    return 1;
 }

Try compiling this code into LLVM and a C shared library:

% clang -O3 -emit-llvm -c isprime.c

# OS-X
% gcc -O3 -bundle -undefined dynamic_lookup isprime.c -o isprime.so

# Linux
% gcc -O3 -shared isprime.c -o isprime.so

Now, let's put Bitey and ctypes in a head-to-head performance battle:

>>> import bitey
>>> from isprime import isprime as isprime1
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./isprime.so")
>>> isprime2 = ex.isprime
>>> isprime2.argtypes=(ctypes.c_int,)
>>> isprime2.restype=ctypes.c_int
>>>
>>> from timeit import timeit
>>> # Bitey
>>> timeit("isprime1(3)","from __main__ import isprime1")
1.1813910007476807
>>> # ctypes
>>> timeit("isprime2(3)", "from __main__ import isprime2")
1.2408909797668457
>>>
>>> # Bitey
>>> timeit("isprime1(10143937)", "from __main__ import isprime1")
9.839216947555542
>>> # ctypes
>>> timeit("isprime2(10143937)", "from __main__ import isprime2")
9.663991212844849
>>>

As you can see, the performance is just about the same. The main difference would come down to the efficiency of LLVM vs. gcc code optimization.

Advanced Usage

If you're up for a bit of adventure, the module creation process can be altered through the use of pre and post loading files.

A pre-load file provides Python code that executes within the newly created module prior to the LLVM-binding step. One use of this code is to specify the names of fields on data structures. For example, you can create the following pre-load file for the earlier Point example:

# point.pre.py

class Point:
    _fields_ = ['x','y']

If you do this, you'll find that the field-names get fixed:

>>> import point
>>> p = point.Point(3,4)
>>> p.x
3.0
>>> p.y
4.0
>>>

You could also use a pre-load file to load library dependencies:

# sample.pre.py
import bitey
bitey.load_library("./libfoo.so")

A post-load file allows you alter the contents of the module after LLVM-binding. You could use this to apply decorators or add additional support code. For example:

# point.post.py
#
# Example of decorating a function already wrapped

def decorate(func):
    def wrapper(*args, **kwargs):
        print "Calling", func.__name__
        return func(*args, **kwargs)
    wrapper.__name__ = func.__name__
    return wrapper

# Wrap the distance wrapper already created
distance = decorate(distance)

The combination of the pre/post loading files gives you almost unlimited opportunity for insane evil when loading the bitcode. It must be stressed that that these files are executed in the space of the module being created---they are not separate imports (i.e., the pre, post, and LLVM bindings all co-exist in the same module namespace).

Automatic Binding

In the examples, it is necessary to use import bitey for modules to be recognized and loaded. If you want to skip this step and make everything automatic, create a bitey.pth file that contains the following statement:

# bitey.pth
import bitey

Now, copy this file to the Python site-packages directory.

FAQ

Q: Will Bitey ever support C++?

A: No. C++ can bite me (*)

(*) I also wrote Swig and still have C++ scars.

Q: Why is it called "Bitey?"

A: Well, "Bitey" is so much more catchy than simply calling it something boring like "BIT (Bitcode Import Tool)". Plus, just like @johnderosa's pet Pomeranian of the same name, you're never quite sure whether "Bitey" is adorably cute or a viscious beast that will constantly nip your leg. Actually, I just like the ring of it--"Bitey" sort of rhymes with "Enterprisey".

Discussion Group

A discussion group for Bitey is available at http://groups.google.com/group/bitey

Authors

More Repositories

1

curio

Good Curio!
Python
4,001
star
2

python-cookbook

Code samples from the "Python Cookbook, 3rd Edition", published by O'Reilly & Associates, May, 2013.
Python
3,864
star
3

ply

Python Lex-Yacc
Python
2,717
star
4

sly

Sly Lex Yacc
Python
804
star
5

dataklasses

A different spin on dataclasses.
Python
782
star
6

generators

Generator Tricks for Systems Programmers (Tutorial)
422
star
7

cluegen

Get a clue, get some code
Python
355
star
8

thredo

Thredo was an experiment - It's dead. Feel free to look around.
Python
338
star
9

blog

David Beazley's blog.
261
star
10

concurrencylive

Code from Concurrency Live - PyCon 2015
Python
157
star
11

python-distilled

Resources for Python Distilled (Book)
85
star
12

modulepackage

Materials for PyCon2015 Tutorial "Modules and Packages : Live and Let Die"
Python
74
star
13

wadze

Web Assembly Decoder - Zero Extras
Python
70
star
14

me-al

MeαΊ—al - The Decorator
Python
64
star
15

pythonprog

Python
62
star
16

typemap

Typemap - The Annotator (TM)
Python
55
star
17

pylox

Python implementation of the Lox language from Robert Nystrom's Crafting Interpreters
Python
47
star
18

flail

Ball and Chain Decorators
Python
41
star
19

asm6502

A small 6502 assembler written in Python
Python
21
star
20

ranet

An Implementation of Raft in Janet
18
star
21

hoppy

9
star
22

raft_jun_2019

Raft Course 2019
Python
9
star
23

colonel

C Curio Kernel
Python
8
star
24

raft_dec19

Raft Project December 2019
Python
8
star
25

theater

Code for "The Problem with the Problem" talk, February 22, 2022.
Python
7
star
26

dabeaz.github.io

HTML
4
star
27

frothy

Frothy example from CSCI1730 Fall 2023
Racket
3
star
28

archive

Software Archive
3
star