• Stars
    star
    273
  • Rank 145,636 (Top 3 %)
  • Language
    Ruby
  • Created over 15 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code from my series on writing a Ruby compiler in Ruby

Writing a (Ruby) compiler in Ruby

Source for my series on writing a compiler in Ruby.

See http://www.hokstad.com/compiler

NOTE This is still wildly incomplete.

Status as of May 14th 2023

  • The compiler self-hosts, but is slow and GC is disabled.
  • Self-hosting was achieved by working around a number of bugs that still needs to be fixed.
  • Type tagging has been put in place for integers, which drastically reduces amount of object creation
  • Starting to work through some of the updates needed to approach Ruby 3.2 compatibility. That is a very long road.
  • Starting looking at what will be needed to work with Ruby spec. Challenge is figuring out how to make it work w/ahead of time compilation without e.g. support for eval (which I might eventually add, but that's far away.

Older highlights

  • Garbage collector is integrated; garbage collection article nearly done.
  • A number of compiler bugs have been worked around to get this far. Most sites in the compiler affected are marked with @bug
  • Current garbage collection overhead is a problem. A few easy wins, and avenues to investigate:
    • Pre-create objects for all constants (numeric and string in particular)
    • Currently Proc and env objects are created separately; might be worth allocating the env as part of the Proc object, but not sure it's worthwhile
    • Capture stats on number of allocated objects per class, and output,
  • When compiling the compiler with itself it successfully parses all of itself and produces identical output to when run under MRI. This does not mean the parse is complete (it absolutely is not), or bug free - it means the parser acts correctly on the very specific subset of expressions currently present in the compiler itself.

Assuming I get time to continue current progress, the compiler might fully compile itself and the compiled version might be able to compile itself this autumn.

(to make that clear, what I want to get to is:

  1. Run the compiler source with MRI on its own source to produce a "compiler1" that is a native i386 binary
  2. Run "compiler1" with its own source as input to produce a "compiler2"
  3. Run "compiler2" with its own source as input to produce a "compiler3"

Currently step 1 "works" to the extent that it produces a binary, but that binary has bugs, and so fails to produce a compiler2. To complete the bootstrap process I need it to complete the compile and produce a binary, but I also need that binary to be correct. I can part-validate that by comparing it to "compiler1" - they should have identical assembler source, but the best way of validating it fully is to effectively repeat step 2, but with "compiler2" as the input, and verify that "compiler2" and "compiler3" are identical, to validate the entire end-to-end process. This may seem paranoid, but once step2 works the point is step3 should be trivial, so there's no point in not taking that extra step.

Before getting too excited about trying to use the compiler at the point when it bootstraps fully, note:

  • The compiler itself carefully avoids known missing functionality, and/or I work around some during testing the bootstrap. The big ones:
    • Exceptions (used by the compiler, but only begin/rescue causes problems and that's only used once; commented out for testing)
    • Regexp (not used by the compiler)
    • Float (not used by the compiler)
  • The compiler code is littered with workarounds for specific bugs (they're not consistently marked, but FIXME will include all of the workarounds for compiler bugs and more, and whenever I find new ones they're also marked @bug).
  • The GC mentioned above is very simple and not well suited for the sheer amount of objects currently allocated. It needs a number of improvements to handle many small objects, and the compiler needs additional work to reduce the number of objects created.

Once the compiler is bootstrapped w/workarounds, my next steps are:

  • Add support for exceptions (prob. worth a blog post)
  • Go through the current FIXME's and explicitly check which are still relevant (some have likely been fixed as a result of other bug fixes); add test cases, and fix them in turn.
  • Make mspec compile
  • Make the Ruby Spec Suite run, and cry over how large parts of it will fail.
  • Some of the GC improvements mentioned above.

Caveats

This section covers caveats about compiled Ruby vs. MRI, not generally missing pieces or bugs in the current state of the compiler (of which there are many).

require

Presently, "require" is evaluated statically at compile time.

This makes certain Ruby patterns hard or impossible to support. E.g. reading the contents of a directory and caling "require" for each .rb file found will not presently work, and may never work, as it is not clear in the context of compilation whether or not the intent is to load this file at compile time or runtime.

Ruby allows the argument to "require" to be dynamically generated. E.g. "require File.dirname(FILE) + '/blah'". To facilitate compatibility, limited forms of this pattern may eventually be supported.

On MRI, "require" is generally overridden by a custom version for rubygems or bundler. This is not likely to ever be supported. "require" is likely to be treated as a keyword, rather than as an overrideable method.

$0

While $0 will at some point be initialized with the name of the file compilation is triggered for, certain patterns of Ruby, such as conditionally executing code based on whether a given file is executed directly are conceptually different, given that $0 gets bound at compile time.

We'll need to consider if the right behaviour is for $0 and/or __FILE__ to contain the equivalent of C's argv[0] instead. Possibly make $0 and __FILE__ refer to different things.

$:, $LOAD_PATH

The load path is malleable in MRI, and this is very frequently used alongside certain methods to modify which files may be loaded. Currently this is not supported.

It is likely that for compatibility a limited subset of Ruby will be interpreted at compile time to support some forms of this pattern. See also "require"

More Repositories

1

rubywm

An X11 window manager in pure Ruby
Ruby
133
star
2

diagram-tools

A number of small tools for generating and manipulating diagrams, mostly based around Graphviz
XSLT
109
star
3

sliding-stats

Sliding stats is Rack Middleware that presents a "sliding window" over a web apps last N requests (where N is a reasonably low value, as it's all kept in memory). This is currently a "work in progress" - don't deploy this anywhere without testing it first; It is likely to have nasty bugs.
Ruby
36
star
4

wave-gadget-emulator

Small emulation layer for the Google Wave gadget API to run a gadget side by side in two iframes
JavaScript
17
star
5

FrexxEd

FrexxEd is a flexible Amiga text editor with scripting support, ARexx, multiple views, built in filesystem handler for open buffers and more
C
17
star
6

re

A very simple, very much in progress text editor written in Ruby.
Ruby
16
star
7

Git

AmigaOS/AROS/MorphOS port of Git
C
9
star
8

ACE

Compiler for a large subset of Amiga BASIC + extensions
C
9
star
9

skrift

A pure Ruby conversion (*not* wrapper) of the libschrift TrueType font renderer
Ruby
9
star
10

wave-gadget-svg-editor

SVG editor as a Google Wave gadget
6
star
11

packrat

Package manager for AmigaOS and Amiga-like OS's (AROS etc.)
C
5
star
12

FPL

Frexx Programming Language - a portable scripting language originating on the Amiga
C
5
star
13

amiga-restoration

Project to "restore" classic Amiga freeware so that it will build and run on AROS and AmigaOS 3.5+
C
3
star
14

html2builder

Take HTML in, and spit Markaby / Nokogiri style Builder code out
Ruby
3
star
15

rouge-gtk_theme_loader

Load GtkSourceView themes into Rouge (Ruby syntax highlighter)
Ruby
3
star
16

sdlvnc

Shell
2
star
17

svg_graph

SVG graph generation in Ruby
Ruby
2
star
18

amiga-spec

Test suite for AmigaOS compatible OS's
C
2
star
19

rubyterm

Tiny ruby terminal for X11
Makefile
2
star
20

p9fs-ruby

A very early/incomplete/experimental/broken start of a Plan 9 filesystem implementation in Ruby
Ruby
1
star
21

make-rules

A small collection of somewhat reusable Makefile rules I include in projects
Makefile
1
star
22

filters

Experimental set of line oriented text filters
Ruby
1
star
23

sequel-track-changes

Small Sequel plugin to track changes to tables manipulated via Sequel models.
Ruby
1
star
24

aros-vm

Scripts to build a Linux OpenVz container suitable for AROS development
Shell
1
star
25

file_writer

Ruby gem to overwrite files safely and efficiently and make backup copy.
Ruby
1
star
26

m68k-amiga-crosstools

Makefile to generate Debian package of gcc, binutils etc. to compile M68k Amiga code on x86 Linux
1
star
27

collectionadapters

Adapters for Ruby collections classes
Ruby
1
star