AMaCC = Arguably Minimalist Arm C Compiler
Introduction
AMaCC is built from scratch, targeted at 32-bit Arm architecture. It is a stripped down version of C meant as a pedagogical tool for learning about compilers, linkers, and loaders.
There are 2 execution modes AMaCC implements:
- Just-in-Time compiler (JITC) for Arm backend
- Generate valid GNU/Linux executables with Executable and Linkable Format (ELF)
It is worth mentioning that AMaCC is designed to compile a subset of C required to self-host with the above execution modes. For example, global variables and, in particular, global arrays are there.
A simple stack based AST is generated through cooperating stmt()
and expr()
parsing functions, both of which are fed by a token generating function.
The expr()
function does some literal constant optimizations. The AST is
transformed into a stack-based VM Intermediate Representation via a gen()
function. The IR can be examined through a command-line option. Finally,
a codegen()
function is used to generate ARM32 instructions from the IR
which can be executed via either jit()
or elf32()
executable generation.
AMACC mixes classical recursive descent and operator precedence parsing. An operator precedence parser is actually quite a bit faster than recursive descent parser (RDP) for expressions when operator precedence is defined using grammar productions that would otherwise get turned into methods.
Compatibility
AMaCC is capable of compiling C source files written in the following syntax:
- support for all C89 statements except typedef.
- support for all C89 expression operators.
- data types: char, int, enum, struct, union, and multi-level pointers
- type modifiers, qualifiers, and storage class specifiers are currently unsupported, though many keywords of this nature are not routinely used, and can be easily worked around with simple alternative constructs.
- struct/union assignments are not supported at the language level in AMaCC, e.g. s1 = s2. This also applies to function return values and parameters. Passing and returning pointers is recommended. Use memcpy if you want to copy a full struct, e.g. memcpy(&s1, &s2, sizeof(struct xxx));
- global/local variable initializations for supported data types
- e.g.,
int i = [expr]
- New variables are allowed to be declared within functions anywhere.
- item-by-item array initialization is supported
- but aggregate array declaration and initialization is yet to be supported
e.g.,
int foo[2][2] = { { 1, 0 }, { 0, 1 } };
- e.g.,
The architecture support targets armv7hf with Linux ABI, verified on Raspberry Pi 2/3/4 with GNU/Linux.
Prerequisites
-
Code generator in AMaCC relies on several GNU/Linux behaviors, and it is necessary to have Arm/Linux installed in your build environment.
-
Install GNU Toolchain for the A-profile Architecture
- Select
arm-linux-none-gnueabihf
(AArch32 target with hard float)
- Select
-
Install QEMU for Arm user emulation
sudo apt-get install qemu-user
Running AMaCC
Run make check
and you should see this:
[ C to IR translation ] Passed
[ JIT compilation + execution ] Passed
[ ELF generation ] Passed
[ nested/self compilation ] Passed
[ Compatibility with GCC/Arm ] ........................................
----------------------------------------------------------------------
Ran 52 tests in 8.842s
OK
Check the messages generated by make help
to learn more.
Benchmark
AMaCC is able to generate machine code really fast and provides 70% of the performance of gcc -O0
.
Test environment:
- Raspberry Pi 4B (SoC: bcm2711, ARMv8-A architecture)
- Raspbian GNU/Linux, kernel 5.10.17-v7l+, gcc 8.3.0 (armv7l userland)
Input source file: amacc.c
compiler driver | binary size (KiB) | compile time (s) |
---|---|---|
gcc with -O0 -ldl (compile+link) |
56 | 0.5683 |
gcc with -O0 -c (compile only) |
56 | 0.4884 |
AMaCC | 100 | 0.0217 |
Internals
Check Intermediate Representation (IR) for AMaCC Compilation.
Acknowledgements
AMaCC is based on the infrastructure of c4.