• Stars
    star
    1,733
  • Rank 26,888 (Top 0.6 %)
  • Language
    C
  • Created over 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

How to write a very simple JIT compiler

How to write a JIT compiler

First up, you probably don't want to. JIT, or more accurately "dynamic code generation," is typically not the most effective way to optimize a project, and common techniques end up trading away a lot of portability and require fairly detailed knowledge about processor-level optimization.

That said, though, writing JIT compiler is a lot of fun and a great way to learn stuff. The first thing to do is to write an interpreter.

NOTE: If you don't have solid grasp of UNIX system-level programming, you might want to read about how to write a shell, which covers a lot of the fundamentals.

MandelASM

GPUs are fine for machine learning, but serious fractal enthusiasts design their own processors to generate Mandelbrot sets. And the first step in processor design, of course, is to write an emulator for it. Our emulator will interpret the machine code we want to run and emit an image to stdout.

To keep it simple, our processor has four complex-valued registers called a, b, c, and d, and it supports three in-place operations:

  • =ab: assign register a to register b
  • +ab: add register a to register b
  • *ab: multiply register b by register a

For each pixel, the interpreter will zero all of the registers and then set a to the current pixel's coordinates. It then iterates the machine code for up to 256 iterations waiting for register b to "overflow" (i.e. for its complex absolute value to exceed 2). That means that the code for a standard Mandelbrot set is *bb+ab.

Simple interpreter

The first thing to do is write up a bare-bones interpreter in C. It would be simpler to use complex.h here, but I'm going to write it in terms of individual numbers because the JIT compiler will end up generating the longhand logic. In production code we'd include bounds-checks and stuff, but I'm omitting those here for simplicity.

// simple.c
#include <stdio.h>
#include <stdlib.h>

#define sqr(x) ((x) * (x))

typedef struct { double r; double i; } complex;

void interpret(complex *registers, char const *code) {
  complex *src, *dst;
  double r, i;
  for (; *code; code += 3) {
    dst = &registers[code[2] - 'a'];
    src = &registers[code[1] - 'a'];
    switch (*code) {
      case '=':
        dst->r = src->r;
        dst->i = src->i;
        break;
      case '+':
        dst->r += src->r;
        dst->i += src->i;
        break;
      case '*':
        r = dst->r * src->r - dst->i * src->i;
        i = dst->r * src->i + dst->i * src->r;
        dst->r = r;
        dst->i = i;
        break;
      default:
        fprintf(stderr, "undefined instruction %s (ASCII %x)\n", code, *code);
        exit(1);
    }
  }
}

int main(int argc, char **argv) {
  complex registers[4];
  int i, x, y;
  char line[1600];
  printf("P5\n%d %d\n%d\n", 1600, 900, 255);
  for (y = 0; y < 900; ++y) {
    for (x = 0; x < 1600; ++x) {
      registers[0].r = 2 * 1.6 * (x / 1600.0 - 0.5);
      registers[0].i = 2 * 0.9 * (y /  900.0 - 0.5);
      for (i = 1; i < 4; ++i) registers[i].r = registers[i].i = 0;
      for (i = 0; i < 256 && sqr(registers[1].r) + sqr(registers[1].i) < 4; ++i)
        interpret(registers, argv[1]);
      line[x] = i;
    }
    fwrite(line, 1, sizeof(line), stdout);
  }
  return 0;
}

Now we can see the results by using display from ImageMagick (apt-get install imagemagick), or by saving to a file:

$ gcc simple.c -o simple
$ ./simple *bb+ab | display -           # imagemagick version
$ ./simple *bb+ab > output.pgm          # save a grayscale PPM image
$ time ./simple *bb+ab > /dev/null      # quick benchmark
real	0m2.369s
user	0m2.364s
sys	0m0.000s
$

image

Performance analysis

In the real world, JIT is absolutely the wrong move for this problem.

Array languages like APL, Matlab, and to a large extent Perl, Python, etc, manage to achieve reasonable performance by having interpreter operations that apply over a large number of data elements at a time. We've got exactly that situation here: in the real world it's a lot more practical to vectorize the operations to apply simultaneously to a screen-worth of data at a time -- then we'd have nice options like offloading stuff to a GPU, etc.

However, since the point here is to compile stuff, on we go.

JIT can basically eliminate the interpreter overhead, which we can easily model here by replacing interpret() with a hard-coded Mandelbrot calculation. This will provide an upper bound on realistic JIT performance, since we're unlikely to optimize as well as gcc does.

// hardcoded.c
#include <stdio.h>
#include <stdlib.h>

#define sqr(x) ((x) * (x))

typedef struct { double r; double i; } complex;

void interpret(complex *registers, char const *code) {
  complex *a = &registers[0];
  complex *b = &registers[1];
  double r, i;
  r = b->r * b->r - b->i * b->i;
  i = b->r * b->i + b->i * b->r;
  b->r = r;
  b->i = i;
  b->r += a->r;
  b->i += a->i;
}

int main(int argc, char **argv) {
  complex registers[4];
  int i, x, y;
  char line[1600];
  printf("P5\n%d %d\n%d\n", 1600, 900, 255);
  for (y = 0; y < 900; ++y) {
    for (x = 0; x < 1600; ++x) {
      registers[0].r = 2 * 1.6 * (x / 1600.0 - 0.5);
      registers[0].i = 2 * 0.9 * (y /  900.0 - 0.5);
      for (i = 1; i < 4; ++i) registers[i].r = registers[i].i = 0;
      for (i = 0; i < 256 && sqr(registers[1].r) + sqr(registers[1].i) < 4; ++i)
        interpret(registers, argv[1]);
      line[x] = i;
    }
    fwrite(line, 1, sizeof(line), stdout);
  }
  return 0;
}

This version runs about twice as fast as the simple interpreter:

$ gcc hardcoded.c -o hardcoded
$ time ./hardcoded *bb+ab > /dev/null
real	0m1.329s
user	0m1.328s
sys	0m0.000s
$

JIT design and the x86-64 calling convention

The basic strategy is to replace interpret(registers, code) with a function compile(code) that returns a pointer to a function whose signature is this: void compiled(registers*). The memory for the function needs to be allocated using mmap so we can set permission for the processor to execute it.

The easiest way to start with something like this is probably to emit the assembly for simple.c to see how it works:

$ gcc -S simple.c

Edited/annotated highlights from the assembly simple.s, which is much more complicated than what we'll end up generating:

interpret:
        // The stack contains local variables referenced to the "base pointer"
        // stored in hardware register %rbp. Here's the layout:
        //
        //   double i  = -8(%rbp)
        //   double r  = -16(%rbp)
        //   src       = -24(%rbp)
        //   dst       = -32(%rbp)
        //   registers = -40(%rbp)      <- comes in as an argument in %rdi
        //   code      = -48(%rbp)      <- comes in as an argument in %rsi

        pushq   %rbp
        movq    %rsp, %rbp              // standard x86-64 function header
        subq    $48, %rsp               // allocate space for six local vars
        movq    %rdi, -40(%rbp)         // registers arg -> local var
        movq    %rsi, -48(%rbp)         // code arg -> local var
        jmp     for_loop_condition      // commence loopage

Before getting to the rest, I wanted to call out the %rsi and %rdi stuff and explain a bit about how calls work on x86-64. %rsi and %rdi seem arbitrary, which they are to some extent -- C obeys a platform-specific calling convention that specifies how arguments get passed in. On x86-64, up to six arguments come in as registers; after that they get pushed onto the stack. If you're returning a value, it goes into %rax.

The return address is automatically pushed onto the stack by call instructions like e8 <32-bit relative>. So internally, call is the same as push ADDRESS; jmp <call-site>; ADDRESS: .... ret is the same as pop %rip, except that you can't pop into %rip. This means that the return address is always the most immediate value on the stack.

Part of the calling convention also requires callees to save a couple of registers and use %rbp to be a copy of %rsp at function-call-time, but our JIT can mostly ignore this stuff because it doesn't call back into C.

for_loop_body:
        // (a bunch of stuff to set up *src and *dst)

        cmpl    $43, %eax               // case '+'
        je      add_branch
        cmpl    $61, %eax               // case '='
        je      assign_branch
        cmpl    $42, %eax               // case '*'
        je      mult_branch
        jmp     switch_default          // default

assign_branch:
        // the "bunch of stuff" above calculated *src and *dst, which are
        // stored in -24(%rbp) and -32(%rbp).
        movq    -24(%rbp), %rax         // %rax = src
        movsd   (%rax), %xmm0           // %xmm0 = src.r
        movq    -32(%rbp), %rax         // %rax = dst
        movsd   %xmm0, (%rax)           // dst.r = %xmm0

        movq    -24(%rbp), %rax         // %rax = src
        movsd   8(%rax), %xmm0          // %xmm0 = src.i
        movq    -32(%rbp), %rax         // %rax = dst
        movsd   %xmm0, 8(%rax)          // dst.i = %xmm0

        jmp     for_loop_step

add_branch:
        movq    -32(%rbp), %rax         // %rax = dst
        movsd   (%rax), %xmm1           // %xmm1 = dst.r
        movq    -24(%rbp), %rax         // %rax = src
        movsd   (%rax), %xmm0           // %xmm0 = src.r
        addsd   %xmm1, %xmm0            // %xmm0 += %xmm1
        movq    -32(%rbp), %rax         // %rax = dst
        movsd   %xmm0, (%rax)           // dst.r = %xmm0

        movq    -32(%rbp), %rax         // same thing for src.i and dst.i
        movsd   8(%rax), %xmm1
        movq    -24(%rbp), %rax
        movsd   8(%rax), %xmm0
        addsd   %xmm1, %xmm0
        movq    -32(%rbp), %rax
        movsd   %xmm0, 8(%rax)

        jmp     for_loop_step

mult_branch:
        movq    -32(%rbp), %rax
        movsd   (%rax), %xmm1
        movq    -24(%rbp), %rax
        movsd   (%rax), %xmm0
        mulsd   %xmm1, %xmm0
        movq    -32(%rbp), %rax
        movsd   8(%rax), %xmm2
        movq    -24(%rbp), %rax
        movsd   8(%rax), %xmm1
        mulsd   %xmm2, %xmm1
        subsd   %xmm1, %xmm0
        movsd   %xmm0, -16(%rbp)        // double r = src.r*dst.r - src.i*dst.i

        movq    -32(%rbp), %rax
        movsd   (%rax), %xmm1
        movq    -24(%rbp), %rax
        movsd   8(%rax), %xmm0
        mulsd   %xmm0, %xmm1
        movq    -32(%rbp), %rax
        movsd   8(%rax), %xmm2
        movq    -24(%rbp), %rax
        movsd   (%rax), %xmm0
        mulsd   %xmm2, %xmm0
        addsd   %xmm1, %xmm0
        movsd   %xmm0, -8(%rbp)         // double i = src.r*dst.i + src.i*dst.r

        movq    -32(%rbp), %rax
        movsd   -16(%rbp), %xmm0
        movsd   %xmm0, (%rax)           // dst.r = r
        movq    -32(%rbp), %rax
        movsd   -8(%rbp), %xmm0
        movsd   %xmm0, 8(%rax)          // dst.i = i
        jmp     for_loop_step

for_loop_step:
        addq    $3, -48(%rbp)

for_loop_condition:
        movq    -48(%rbp), %rax         // %rax = code (the pointer)
        movzbl  (%rax), %eax            // %eax = *code (move one byte)
        testb   %al, %al                // is %eax 0?
        jne     for_loop_body           // if no, then continue

        leave                           // otherwise rewind stack
        ret                             // pop and jmp

Compilation strategy

Most of the above is register-shuffling fluff that we can get rid of. We're compiling the code up front, which means all of our register addresses are known quantities and we won't need any unknown indirection at runtime. So all of the shuffling into and out of %rax can be replaced by a much simpler move directly to or from N(%rdi) -- since %rdi is the argument that points to the first register's real component.

If you haven't already, at this point I'd recommend downloading the Intel software developer's manual, of which volume 2 describes the semantics and machine code representation of every instruction.

NOTE: GCC uses AT&T assembly syntax, whereas the Intel manuals use Intel assembly syntax. An important difference is that AT&T reverses the arguments: mov %rax, %rbx (AT&T syntax) assigns to %rbx, whereas mov rax, rbx (Intel syntax) assigns to rax. All of my code examples use AT&T, and none of this will matter once we're working with machine code.

Example: the Mandelbrot function *bb+ab
// Step 1: multiply register B by itself
movsd 16(%rdi), %xmm0                   // %xmm0 = b.r
movsd 24(%rdi), %xmm1                   // %xmm1 = b.i
movsd 16(%rdi), %xmm2                   // %xmm2 = b.r
movsd 24(%rdi), %xmm3                   // %xmm3 = b.i
movsd %xmm0, %xmm4                      // %xmm4 = b.r
mulsd %xmm2, %xmm4                      // %xmm4 = b.r*b.r
movsd %xmm1, %xmm5                      // %xmm5 = b.i
mulsd %xmm3, %xmm5                      // %xmm5 = b.i*b.i
subsd %xmm5, %xmm4                      // %xmm4 = b.r*b.r - b.i*b.i
movsd %xmm4, 16(%rdi)                   // b.r = %xmm4

mulsd %xmm0, %xmm3                      // %xmm3 = b.r*b.i
mulsd %xmm1, %xmm2                      // %xmm2 = b.i*b.r
addsd %xmm3, %xmm2                      // %xmm2 = b.r*b.i + b.i*b.r
movsd %xmm2, 24(%rdi)                   // b.i = %xmm2

// Step 2: add register A to register B
movpd (%rdi), %xmm0                     // %xmm0 = (a.r, a.i)
addpd %xmm0, 16(%rdi)                   // %xmm0 += (b.r, b.i)
movpd %xmm0, 16(%rdi)                   // (b.r, b.i) = %xmm0

The multiplication code isn't optimized for the squaring-a-register use case; instead, I left it fully general so we can use it as a template when we start generating machine code.

JIT mechanics

Before we compile a real language, let's just get a basic code generator working.

// jitproto.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

typedef long(*fn)(long);

fn compile_identity(void) {
  // Allocate some memory and set its permissions correctly. In particular, we
  // need PROT_EXEC (which isn't normally enabled for data memory, e.g. from
  // malloc()), which tells the processor it's ok to execute it as machine
  // code.
  char *memory = mmap(NULL,             // address
                      4096,             // size
                      PROT_READ | PROT_WRITE | PROT_EXEC,
                      MAP_PRIVATE | MAP_ANONYMOUS,
                      -1,               // fd (not used here)
                      0);               // offset (not used here)
  if (memory == MAP_FAILED) {
    perror("failed to allocate memory");
    exit(1);
  }

  int i = 0;

  // mov %rdi, %rax
  memory[i++] = 0x48;           // REX.W prefix
  memory[i++] = 0x8b;           // MOV opcode, register/register
  memory[i++] = 0xc7;           // MOD/RM byte for %rdi -> %rax

  // ret
  memory[i++] = 0xc3;           // RET opcode

  return (fn) memory;
}

int main() {
  fn f = compile_identity();
  int i;
  for (i = 0; i < 10; ++i)
    printf("f(%d) = %ld\n", i, (*f)(i));
  munmap(f, 4096);
  return 0;
}

This does what we expect: we've just produced an identity function.

$ gcc jitproto.c -o jitproto
$ ./jitproto
f(0) = 0
f(1) = 1
f(2) = 2
f(3) = 3
f(4) = 4
f(5) = 5
f(6) = 6
f(7) = 7
f(8) = 8
f(9) = 9

TODO: explanation about userspace page mapping/permissions, and how ELF instructions tie into this (maybe also explain stuff like the FD table while we're at it)

Generating MandelASM machine code

This is where we start to get some serious mileage out of the Intel manuals. We need encodings for the following instructions:

  • f2 0f 11: movsd reg -> memory
  • f2 0f 10: movsd memory -> reg
  • f2 0f 59: mulsd reg -> reg
  • f2 0f 58: addsd reg -> reg
  • f2 0f 5c: subsd reg -> reg
  • 66 0f 11: movpd reg -> memory (technically movupd for unaligned move)
  • 66 0f 10: movpd memory -> reg
  • 66 0f 58: addpd memory -> reg
The gnarly bits: how operands are specified

Chapter 2 of the Intel manual volume 2 contains a roundabout, confusing description of operand encoding, so I'll try to sum up the basics here. (TODO)

For the operators above, we've got two ModR/M configurations:

  • movsd reg <-> X(%rdi): mod = 01, r/m = 111, disp8 = X
  • addsd reg -> reg: mod = 11

At the byte level, they're written like this:

movsd %xmm0, 16(%rdi)           # f2 0f 11 47 10
  # modr/m = b01 000 111 = 47
  # disp   = 16          = 10

addsd %xmm3, %xmm4              # f2 0f 58 e3
  # modr/m = b11 100 011 = e3
A simple micro-assembler
// micro-asm.h
#include <stdarg.h>
typedef struct {
  char *dest;
} microasm;

// this makes it more obvious what we're doing later on
#define xmm(n) (n)

void asm_write(microasm *a, int n, ...) {
  va_list bytes;
  int i;
  va_start(bytes, n);
  for (i = 0; i < n; ++i) *(a->dest++) = (char) va_arg(bytes, int);
  va_end(bytes);
}

void movsd_reg_memory(microasm *a, char reg, char disp)
{ asm_write(a, 5, 0xf2, 0x0f, 0x11, 0x47 | reg << 3, disp); }

void movsd_memory_reg(microasm *a, char disp, char reg)
{ asm_write(a, 5, 0xf2, 0x0f, 0x10, 0x47 | reg << 3, disp); }

void movsd_reg_reg(microasm *a, char src, char dst)
{ asm_write(a, 4, 0xf2, 0x0f, 0x11, 0xc0 | src << 3 | dst); }

void mulsd(microasm *a, char src, char dst)
{ asm_write(a, 4, 0xf2, 0x0f, 0x59, 0xc0 | dst << 3 | src); }

void addsd(microasm *a, char src, char dst)
{ asm_write(a, 4, 0xf2, 0x0f, 0x58, 0xc0 | dst << 3 | src); }

void subsd(microasm *a, char src, char dst)
{ asm_write(a, 4, 0xf2, 0x0f, 0x5c, 0xc0 | dst << 3 | src); }

void movpd_reg_memory(microasm *a, char reg, char disp)
{ asm_write(a, 5, 0x66, 0x0f, 0x11, 0x47 | reg << 3, disp); }

void movpd_memory_reg(microasm *a, char disp, char reg)
{ asm_write(a, 5, 0x66, 0x0f, 0x10, 0x47 | reg << 3, disp); }

void addpd_memory_reg(microasm *a, char disp, char reg)
{ asm_write(a, 5, 0x66, 0x0f, 0x58, 0x47 | reg << 3, disp); }
Putting it all together

Now that we can write assembly-level stuff, we can take the structure from the prototype JIT compiler and modify it to compile MandelASM.

// mandeljit.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

#include "micro-asm.h"

#define sqr(x) ((x) * (x))

typedef struct { double r; double i; } complex;
typedef void(*compiled)(complex*);

#define offsetof(type, field) ((unsigned long) &(((type *) 0)->field))

compiled compile(char *code) {
  char *memory = mmap(NULL, 4096, PROT_READ | PROT_WRITE | PROT_EXEC,
                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  microasm a = { .dest = memory };
  char src_dsp, dst_dsp;
  char const r = offsetof(complex, r);
  char const i = offsetof(complex, i);

  for (; *code; code += 3) {
    src_dsp = sizeof(complex) * (code[1] - 'a');
    dst_dsp = sizeof(complex) * (code[2] - 'a');
    switch (*code) {
      case '=':
        movpd_memory_reg(&a, src_dsp, xmm(0));
        movpd_reg_memory(&a, xmm(0), dst_dsp);
        break;

      case '+':
        movpd_memory_reg(&a, src_dsp, xmm(0));
        addpd_memory_reg(&a, dst_dsp, xmm(0));
        movpd_reg_memory(&a, xmm(0), dst_dsp);
        break;

      case '*':
        movsd_memory_reg(&a, src_dsp + r, xmm(0));
        movsd_memory_reg(&a, src_dsp + i, xmm(1));
        movsd_memory_reg(&a, dst_dsp + r, xmm(2));
        movsd_memory_reg(&a, dst_dsp + i, xmm(3));
        movsd_reg_reg   (&a, xmm(0), xmm(4));
        mulsd           (&a, xmm(2), xmm(4));
        movsd_reg_reg   (&a, xmm(1), xmm(5));
        mulsd           (&a, xmm(3), xmm(5));
        subsd           (&a, xmm(5), xmm(4));
        movsd_reg_memory(&a, xmm(4), dst_dsp + r);

        mulsd           (&a, xmm(0), xmm(3));
        mulsd           (&a, xmm(1), xmm(2));
        addsd           (&a, xmm(3), xmm(2));
        movsd_reg_memory(&a, xmm(2), dst_dsp + i);
        break;

      default:
        fprintf(stderr, "undefined instruction %s (ASCII %x)\n", code, *code);
        exit(1);
    }
  }

  // Return to caller (important! otherwise we'll segfault)
  asm_write(&a, 1, 0xc3);

  return (compiled) memory;
}

int main(int argc, char **argv) {
  compiled fn = compile(argv[1]);
  complex registers[4];
  int i, x, y;
  char line[1600];
  printf("P5\n%d %d\n%d\n", 1600, 900, 255);
  for (y = 0; y < 900; ++y) {
    for (x = 0; x < 1600; ++x) {
      registers[0].r = 2 * 1.6 * (x / 1600.0 - 0.5);
      registers[0].i = 2 * 0.9 * (y /  900.0 - 0.5);
      for (i = 1; i < 4; ++i) registers[i].r = registers[i].i = 0;
      for (i = 0; i < 256 && sqr(registers[1].r) + sqr(registers[1].i) < 4; ++i)
        (*fn)(registers);
      line[x] = i;
    }
    fwrite(line, 1, sizeof(line), stdout);
  }
  return 0;
}

Now let's benchmark the interpreted and JIT-compiled versions:

$ gcc mandeljit.c -o mandeljit
$ time ./simple *bb+ab > /dev/null
real	0m2.348s
user	0m2.344s
sys	0m0.000s
$ time ./mandeljit *bb+ab > /dev/null
real    0m1.462s
user    0m1.460s
sys     0m0.000s

Very close to the limit performance of the hardcoded version. And, of course, the JIT-compiled result is identical to the interpreted one:

$ ./simple *bb+ab | md5sum
12a1013d55ee17998390809ffd671dbc  -
$ ./mandeljit *bb+ab | md5sum
12a1013d55ee17998390809ffd671dbc  -

Further reading

Debugging JIT compilers

First, you need a good scotch; this one should work.

image

Once you've got that set up, gdb can probably be scripted to do what you need. I've used it somewhat successfully to debug a bunch of hand-written self-modifying machine code with no debugging symbols -- the limitations of the approach ended up being whiskey-related rather than any deficiency of GDB itself.

I've also had some luck using radare2 to figure out when I was generating bogus instructions.

Offline disassemblers like NASM and YASM won't help you.

Low-level

  • The Intel guides cover a lot of stuff we didn't end up using here: addressing modes, instructions, etc. If you're serious about writing JIT compilers, it's worth an in-depth read.

  • Agner Fog's guides to processor-level optimization: an insanely detailed tour through processor internals, instruction parsing pipelines, and pretty much every variant of every processor in existence.

  • The V8 source code: how JIT assemblers are actually written

  • The JVM source code

  • Jonesforth: a well-documented example of low-level code generation and interpreter structure (sort of a JIT alternative)

  • Canard machine code: similar to jonesforth, but uses machine code for its data structures

More Repositories

1

shell-tutorial

How to write a UNIX shell, with a lot of background
C
326
star
2

js-in-ten-minutes

JavaScript in Ten (arbitrarily long) Minutes
Perl
268
star
3

bash-lambda

Anonymous functions and FP stuff for bash
Shell
190
star
4

caterwaul

A Javascript-to-Javascript compiler
Perl
176
star
5

nfu

Numeric Fu for the command line
Perl
110
star
6

cd

A better "cd" for bash
Shell
96
star
7

ni

Say "ni" to data of any size
Perl
82
star
8

bashrc-tmux

Smart auto-tmuxing for SSH logins
Shell
58
star
9

jquery.fix.clone

A compilation of fixes for the clone() method in jQuery.
JavaScript
48
star
10

flotsam

Fast floating-point array serialization for Java and JS
JavaScript
35
star
11

canard

A functional concatenative language implemented in Linux/AMD64 machine code and self-modifying perl
Perl
18
star
12

tinyelf

A way to produce really small x86-64 Linux ELF files
Perl
16
star
13

interviewing-in-ten-minutes

A guide to surviving the technical interviewing process
Perl
15
star
14

zerovpn

Automatic OpenVPN using SSH
Shell
14
star
15

js-typeclasses

A typeclass implementation for JavaScript
JavaScript
13
star
16

cheloniidae

Extreme Java Turtle Graphics
Java
13
star
17

delimited-continuations-in-scheme

A simple implementation of shift/reset using call/cc
Scheme
11
star
18

divergence

A functional JavaScript library
JavaScript
11
star
19

cpp-template-lisp

An attempt to write a Lisp in C++ templates
C++
11
star
20

manhattan-model

A 3D model of Manhattan, built from youtube videos
11
star
21

divergence.rebase

Operator overloading and syntactic macros for JavaScript
JavaScript
11
star
22

jquery.gaussian

Gaussian blur plugin for jQuery
JavaScript
10
star
23

jquery.fix.textarea-clone

A fix for blank <textarea> elements after calling clone()
JavaScript
10
star
24

bash-prompt

A bash prompt with custom indicators
Shell
9
star
25

cheloniidae-live

A port of Cheloniidae to JavaScript/Canvas using the Divergence function library
Perl
9
star
26

www

HTML
8
star
27

conky-compiler

Absolute element positioning for conkyrc
Perl
8
star
28

infuse-js

The best Javascript library that could ever possibly exist, ever
JavaScript
8
star
29

fsh

Functional shell scripts
Shell
6
star
30

js-vim-highlighter

A better JavaScript VIM highlighter
Vim Script
6
star
31

bake

Make in bash
Shell
6
star
32

xv

Process-level virtualization for Linux/x86-64
C
6
star
33

perl-objects

Self-modifying Perl objects
Perl
5
star
34

perlquery

A jQuery-like interface to the filesystem
Perl
5
star
35

writing-self-modifying-perl

A step-by-step introduction to self-modifying Perl files
Perl
5
star
36

git-in-ten-minutes

A quick guide to the more confusing parts of Git
Perl
5
star
37

phi

A JIT-compiled functional language in the making
Perl
5
star
38

wumber

CAD for Haskell
Haskell
4
star
39

information-theory-in-ten-minutes

TeX
4
star
40

figment

A programming language with no defined semantics
Perl
4
star
41

plain-blog

A static blog without any Javascript
Perl
4
star
42

divergence.debug

Expression-level debugging for JavaScript
JavaScript
3
star
43

atom-node

An ATOM->JSON converter for node.js
JavaScript
3
star
44

webcrash

My presentations for the Web 3.0 Crash Course
JavaScript
3
star
45

montenegro

A lightweight web framework for node.js and Caterwaul
JavaScript
3
star
46

lock

A mutex for shell commands
Shell
3
star
47

bipolar

Perl
3
star
48

browserpower

A map/reduce server that uses browsers as computing nodes
JavaScript
3
star
49

yagfs

Yet another Git/FUSE filesystem
Ruby
2
star
50

resume

TeX
2
star
51

dotfiles

Emacs Lisp
2
star
52

modus

A UI library for Caterwaul and Montenegro
JavaScript
2
star
53

instaserver

A quick directory server in node
JavaScript
2
star
54

motley

Administration for a motley crew of questionable machines
Shell
2
star
55

data-science-in-ten-minutes

Data science in substantially more than ten minutes
TeX
2
star
56

divergence-guide

Divergence user's guide
JavaScript
2
star
57

caterwaul-terminal

ANSI terminal wrapper for Caterwaul (like ncurses)
JavaScript
2
star
58

docker

A docker SSH/xpra server with stuff I find useful
Dockerfile
2
star
59

futon

Design notes for a futon made from 2x6 spruce
Perl
2
star
60

mulholland

A totally awesome term-rewriting language
Perl
2
star
61

diskbench

A small set of shell scripts to benchmark various disk access patterns
Shell
2
star
62

scala-ctags

A CTags langdef for Scala
2
star
63

metaoptimize-challenge

My solutions to the challenge problem posted at http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m
2
star
64

thermal

A dependency-tracking project management application
JavaScript
2
star
65

jquery.fix.select-clone

A clone() patch to preserve <select> selected values
JavaScript
2
star
66

ocd-scripts

Shell scripts for people with OCD tendencies
Shell
2
star
67

markdown-unlit

Literate compiler for Markdown
Perl
2
star
68

caterwaul-serialization

A serialization library that supports abstract values
JavaScript
2
star
69

note-paper

Graph paper with embedded data markings
PostScript
2
star
70

call-cc-in-ten-minutes

A quick guide to continuations from a Javascript perspective
Perl
2
star
71

mapomatic

Instant Leaflet.js maps
Perl
2
star
72

quickdupe

Fast duplicate-file detector
Perl
2
star
73

node-runabuf

A native extension to execute a node.js Buffer object as machine code
Assembly
2
star
74

rather-insane-serialization

A fairly complete serialization system in Javascript
Perl
2
star
75

perl-in-ten-minutes

A guide to the world's finest programming language
TeX
2
star
76

rho

A Ruby/C/Forth-inspired language
Vim Script
2
star
77

sdoc

Simple documentation for lightweight development
JavaScript
2
star
78

caterwaul-invariant

A library to maintain invariants across state transitions
JavaScript
2
star
79

lazytest

Tests for lazy developers (not that I know of any)
Perl
2
star
80

mathbio2008

A math/biology research project from summer 2008
TeX
1
star
81

mulholland-asm

An x86-64 assembler written in mulholland
JavaScript
1
star
82

caterwaul-reflection

Lexical closure inspection and first-class scope chains for Javascript
Perl
1
star
83

node-talk

A trivial command-line chat client and server
Perl
1
star
84

jquery.instavalidate

A lightweight, general-purpose text field validator for jQuery
JavaScript
1
star
85

bash-hats

Replayable command history
Shell
1
star
86

caterwaul.llasm

A low-level assembler/ELF generator for Caterwaul
JavaScript
1
star
87

peril

The successor of ni
Perl
1
star
88

uml-machine

A self-modifying Perl script to install and manage user-mode linux instances
1
star
89

divergence.vector

Vector geometry Divergence module
JavaScript
1
star
90

caterwaul-futures

A simple but expressive future library for Caterwaul
Perl
1
star
91

caterwaul-c

A really awful C parser/serializer for Caterwaul
Shell
1
star
92

ssh-baby-monitors

So the NSA can't hear stuff your baby says
1
star
93

webshell

A collection of instant-feedback web tools
1
star
94

caterwaul.analysis

Code analysis for Javascript
JavaScript
1
star
95

on

A simple way to run something on another machine
1
star
96

caterwaul.queue.blocking

An asynchronous blocking queue (should work on both client and server)
JavaScript
1
star
97

caterwaul-factory

A Caterwaul library for generating test data
JavaScript
1
star
98

node-router

A simple multiprotocol request router for node.js
JavaScript
1
star
99

bash-variable

Self-modifying files for storing values in bash
Perl
1
star
100

caterwaul-splunge

Realtime graphing for Caterwaul
Perl
1
star