• Stars
    star
    1,036
  • Rank 42,681 (Top 0.9 %)
  • Language
    C
  • License
    BSD 2-Clause "Sim...
  • Created over 3 years ago
  • Updated 23 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A self-hosting and educational C optimizing compiler

shecc : self-hosting and educational C compiler

logo image

Introduction

shecc is built from scratch, targeted at 32-bit Arm and RISC-V architecture, as a self-compiling compiler for a subset of the C language.

Features

  • Generate executable Linux ELF binaries for ARMv7-A and RV32IM;
  • Provide a minimal C standard library for basic I/O on GNU/Linux;
  • The cross-compiler is written in ANSI C, arguably running on most platforms;
  • Self-contained C language front-end and machine code generator;
  • Two-pass compilation: on the first pass it checks the syntax of statements and constructs a table of symbols, while on the second pass it actually translates program statements into Arm/RISC-V machine code.

Compatibility

shecc is capable of compiling C source files written in the following syntax:

  • data types: char, int, struct, and pointer
  • condition statements: if, while, for, switch, case, break, return, and general expressions
  • compound assignments: +=, -=, *=
  • global/local variable initializations for supported data types
    • e.g. int i = [expr]

The backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3.

Bootstrapping

The steps to validate shecc bootstrapping:

  1. stage0: shecc source code is initially compiled using an ordinary compiler which generates a native executable. The generated compiler can be used as a cross-compiler.
  2. stage1: The built binary reads its own source code as input and generates an ARMv7-A/RV32IM binary.
  3. stage2: The generated ARMv7-A/RV32IM binary is invoked (via QEMU or running on Arm and RISC-V devices) with its own source code as input and generates another ARMv7-A/RV32IM binary.
  4. bootstrap: Build the stage1 and stage2 compilers, and verify that they are byte-wise identical. If so, shecc can compile its own source code and produce new versions of that same program.

Prerequisites

Code generator in shecc does not rely on external utilities. You only need ordinary C compilers such as gcc and clang. However, shecc would bootstrap itself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user emulation on GNU/Linux:

$ sudo apt-get install qemu-user

It is still possible to build shecc on macOS or Microsoft Windows. However, the second stage bootstrapping would fail due to qemu-arm absence.

Build and Verify

Configure which backend you want, shecc supports ARMv7-A and RV32IM backend:

$ make config ARCH=arm
# Target machine code switch to Arm

$ make config ARCH=riscv
# Target machine code switch to RISC-V

Run make and you should see this:

  CC+LD	out/inliner
  GEN	out/libc.inc
  CC	out/src/main.o
  LD	out/shecc
  SHECC	out/shecc-stage1.elf
  SHECC	out/shecc-stage2.elf

File out/shecc is the first stage compiler. Its usage:

shecc [-o output] [--no-libc] [--dump-ir] <infile.c>

Compiler options:

  • -o : output file name (default: out.elf)
  • --no-libc : Exclude embedded C library (default: embedded)
  • --dump-ir : Dump intermediate representation (IR)

Example:

$ out/shecc -o fib tests/fib.c
$ chmod +x fib
$ qemu-arm fib

shecc comes with unit tests. To run the tests, give "check" as an argument:

$ make check

Reference output:

...
int main(int argc, int argv) { exit(sizeof(char)); } => 1
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } => 10
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } => 10
OK

To clean up the generated compiler files, execute the command make clean. For resetting architecture configurations, use the command make distclean`.

Intermediate Representation

Once the option --dump-ir is passed to shecc, the intermediate representation (IR) will be generated. Take the file tests/fib.c for example. It consists of a recursive Fibonacci sequence function.

int fib(int n)
{
    if (n == 0)
        return 0;
    else if (n == 1)
        return 1;
    return fib(n - 1) + fib(n - 2);
}

Execute the following to generate IR:

$ out/shecc --dump-ir -o fib tests/fib.c

Line-by-line explanation between C source and IR:

 C Source            IR                         Explanation
-------------------+--------------------------+----------------------------------------------------

int fib(int n)      fib:                        Reserve stack frame for function fib
{                     {
    if (n == 0)         x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 0                 Set x1 to zero
                        x0 == x1 ?              Compare x0 with x1
                        if false then goto 1641 If x0 != x1, then jump to label 1641
        return 0;       x0 := 0                 Set x0 to zero. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1641:
    else if (n == 1)    x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 == x1 ?              Compare x0 with x1
                        if true then goto 1649  If x0 != x1, then jump to label 1649
        return 1;       x0 := 1                 Set x0 to 1. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1649:
    return              x0 = &n                 Get address of variable n
       fib(n - 1)       x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 1)
       +                x0 := fib() @ 1631      Call function fib() into x0
                        push x0                 Store the result on stack
       fib(n - 2);      x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 2                 Set x1 to 2
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 2)
                        x1 := fib() @ 1631      Call function fib() into x1
                        pop x0                  Retrieve the result off stack into x0
                        x0 += x1                Add x1 to x0 i.e. the result of fib(n-1) + fib(n-2)
                        return (from fib)       Jump to function exit
                      }                         Restore the previous stack frame
                      exit fib

Known Issues

  1. The generated ELF lacks of .bss and .rodata section
  2. The support of varying number of function arguments is incomplete. No <stdarg.h> can be used. Alternatively, check the implementation printf in source lib/c.c for var_arg.
  3. The C front-end is a bit dirty because there is no effective AST.

License

shecc is freely redistributable under the BSD 2 clause license. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

More Repositories

1

lkmpg

The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)
TeX
6,855
star
2

lab0-c

C Programming Lab: Assessing Your C Programming Skills
C
382
star
3

rv32emu

Compact and Efficient RISC-V RV32I[MAFC] emulator
C
326
star
4

simplefs

A simple native file system for Linux kernel
C
308
star
5

concurrent-programs

Complementary Concurrency Programs for course "Linux Kernel Internals"
C
301
star
6

jitboy

A Game Boy emulator with dynamic recompilation (JIT)
C
288
star
7

cpumemory-zhtw

Traditional Chinese translation of "What Every Programmer Should Know About Memory"
CSS
216
star
8

semu

A minimalist RISC-V system emulator capable of running Linux kernel
C
212
star
9

vwifi

A virtual wireless device driver for Linux
C
174
star
10

kvm-host

A minimalist type 2 hypervisor using Linux Kernel Virtual Machine (KVM)
C
137
star
11

pitifulvm

A shabby implementation of Java virtual machine in C
C
137
star
12

vcam

Virtual camera device driver for Linux
C
87
star
13

sehttpd

A small and efficient web server with 1K lines of C code
C
82
star
14

cserv

An event-driven and non-blocking web server
C
70
star
15

concurrent-ll

concurrent linked list implementation
C
68
star
16

khttpd

An experimental HTTP server implemented as Linux kernel module
C
60
star
17

rv32emu-legacy

RISC-V RV32I[MA] emulator with ELF support
C
47
star
18

raycaster

Wolfenstein 3D-style raycasting implementation
C
42
star
19

linux-list

Linux-like doubly-linked list
C
39
star
20

fibdrv

Linux kernel module that calculates Fibonacci numbers
Shell
37
star
21

gameboy-emu

An efficient and portable Game Boy emulator
C
36
star
22

lkm-hidden

A Linux kernel module which hides itself
C
29
star
23

rubi

Ruby-like high-performance script programming language with JIT compilation
C
25
star
24

kecho

A lightweight echo server implementation in Linux kernel mode
C
25
star
25

threaded-logger

Threaded Logger
C
21
star
26

threadkit

A collection of lightweight threading utilities
C
20
star
27

vinput

A collection of virtual input device drivers for Linux
C
20
star
28

linux-cfs-sim

Simulate Linux Completely Fair Scheduler (CFS) using POSIX Threads
C
18
star
29

rnnoise

A noise suppression library based on a recurrent neural network
C
17
star
30

kcalc

Math expression evaluation as Linux kernel module
C
16
star
31

dict

Ternary Search Tree + Bloom filter
C
15
star
32

jitcalc

A simple integer calculator using JIT compilation
C
15
star
33

y86_64-tools

Y86-64 Tools: assembler, simulator, Verilog designs
C
14
star
34

fastcat

A faster "cat" implementation using splice and sendfile system calls
C
13
star
35

neocon

A simple serial console utility
C
13
star
36

bignum

An incomplete arbitrary-precision integer arithmetic library
C
13
star
37

fiber

A User Space Threading Library
C
13
star
38

compute-pi

Leibniz formula for ฯ€
C
12
star
39

ca2023-lab3

Lab3: Construct a single-cycle CPU with Chisel
Scala
12
star
40

mapreduce

A simple C Thread pool implementation
C
12
star
41

prefix-search

Implement prefix search using ternary search tree
C
12
star
42

jit-construct

JIT compiler from scratch, derived from Nick Desaulniers' great work
Lua
11
star
43

datalab

Improved CS:APP Data Lab
C
9
star
44

buddy

Buddy Memory Allocator
C
8
star
45

kilo

A text editor in less than 1000 LoC with syntax highlight and search
C
8
star
46

moxiebox

A secure, sandboxed execution mechanism that enables deterministic input, processing and output
C
8
star
47

phonebook

sample phonebook program to illustrate the impact of cache miss
Shell
8
star
48

raytracing

Small ray tracing program for performance evaluation
C
8
star
49

intrusive-ds

A collection of intrusive data-structures for C
C
8
star
50

concurrency-primer

Concurrency Primer
TeX
8
star
51

gecos

GECOS: A lock-free synchronization mechanism
C
7
star
52

nyancat

Nyancat rendered in your terminal
C
6
star
53

matrix_oo

Sample matrix implementation illustrating object-oriented techniques in C99
Shell
6
star
54

dont-trace

A simple Linux kernel module that kills ptrace tracer and its tracees
C
6
star
55

kfifo-examples

Linux kernel module examples about kfifo
C
5
star
56

mergesort-concurrent

merge sort on singly-linked list utilzing POSIX Thread
C
5
star
57

tinymembench

Measure peak bandwidth of sequential memory accesses and the latency of random memory accesses
C
5
star
58

align-bench

Microbenchmark for unaligned memory access
C
4
star
59

cirbuf

Circular Buffer implementation with mmap(2) *incomplete*
C
4
star
60

kcalc-fixed

Math expression evaluation as Linux kernel module, fixed-point implementation
C
4
star
61

prefetcher

Evaluate the effects of prefetching
Shell
3
star
62

malloc-test-concurrent

concurrent malloc benchmark
C
3
star
63

sched-plugin

A Linux kernel module to allow user processes being handed out with LKM based scheduler
C
3
star
64

phonebook-concurrent

build a phonebook program by concurrent linked list
C
3
star
65

simrupt

A Linux device driver that simulates interrupts
C
3
star
66

tco-test

Test the ability of C compilers performing Tail Call Optimization
C
2
star
67

balanced-ternary

Ilustrate how balanced ternary works
Shell
2
star
68

vsnd

Virtual Linux soundcard driver
C
2
star
69

bf-runtime

Brainf*ck runtime engine
C
1
star
70

quotient-filter

(Incomplete) in-memory quotient filter
C
1
star
71

ksort

Linux kernel module that implements and validates sorting algorithms
C
1
star
72

arm-assembler-latex-listings

Arm Assembler language definition for the LaTeX listings package
TeX
1
star
73

clz-tests

Evaluate implementations of count leading zero
C
1
star