Compiler Quest
The main goal is to continually level-up a self-hosting Haskell compiler. However, it’s a survival game as well as an RPG: it should always be possible to build this compiler starting with only a C compiler.
I had thought this was a daunting challenge. In the past, constructing a parser alone was a laborious grind for me. Then it got worse: compilers manipulate abstract syntax trees, so the self-hosting requirement demands the source language be complex enough to handle compound data types.
This time around, I found it shockingly easy to bootstrap a compiler for stripped-down Haskell (or perhaps I should say souped-up lambda calculus), thanks to a few tricks:
-
Parsing combinators are a joy to build from scratch and a joy to use.
-
Kiselyov’s bracket abstraction algorithm is simple yet practical.
-
Interpreting basic combinators is child’s play, and almost the only task we need perform in C or assembly.
-
Lambda calculus gives us the Scott encoding for free. Thus we effortlessly gain algebraic data types.
-
Laziness is at odds with native instructions, which are eagerly evaluated. However, we can readily reconcile their differences with shrewdly chosen combinators.
Perhaps the greatest difficulty was mustering the discipline to mark the road taken so that anyone with a C compiler can follow.
How to build
The earliest generations of our compiler reside in vm.c
:
$ cc -O2 vm.c -o vm
When run without arguments, this program gets each compiler to compile its
successor, which results in a series of numbers that we output to raw
:
$ ./vm > raw
These numbers are a compiled form of the barely.hs
compiler. The vm run
command reads this raw
file and interprets it to compile a given Haskell
file:
$ echo "prependH s = 'H':s;" > /tmp/example.hs $ echo "ello, World!" | ./vm run /tmp/example.hs
It only accepts code that type-checks. Moreover, the (⇐)
operator is
undefined and disallows mixing the (++)
operator with (:)
unless its fixity
has been declared. This conflicts with the examples bundled with
my IOCCC entry. The vm ioccc
command
inserts code to fix these issues:
$ ./vm ioccc fib.hs
The compilers thus far expect pure functions as input. The last function should
have type String → String
, and we implicitly wrap it in the interact
function from the standard Haskell Prelude during compilation.
The effectively.hs
compiler bucks the trend. It assumes the last function
has type IO ()
, and treats it like main
in a standard Haskell program.
It also has support for FFI.
The lonely.hs
compiler is the first generation with a main function of
type IO ()
. It also outputs C code that should be appended to rts.c
.
In sum, after running the above to produce raw
, we can build a standalone
compiler with:
$ (cat rts.c; ./vm run effectively.hs < lonely.hs) > lonely.c $ cc -O2 lonely.c -o lonely
(A few generations later, our virtually.hs
compiler bundles the runtime
system with the output, so the cat
is no longer needed.)