• Stars
    star
    1,119
  • Rank 41,531 (Top 0.9 %)
  • Language
    C
  • License
    BSD 2-Clause "Sim...
  • Created about 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A self-hosting and educational C optimizing compiler

shecc : self-hosting and educational C compiler

logo image

Introduction

shecc is built from scratch, targeted at 32-bit Arm and RISC-V architecture, as a self-compiling compiler for a subset of the C language.

Features

  • Generate executable Linux ELF binaries for ARMv7-A and RV32IM;
  • Provide a minimal C standard library for basic I/O on GNU/Linux;
  • The cross-compiler is written in ANSI C, arguably running on most platforms;
  • Self-contained C language front-end and machine code generator;
  • Two-pass compilation: on the first pass it checks the syntax of statements and constructs a table of symbols, while on the second pass it actually translates program statements into Arm/RISC-V machine code.

Compatibility

shecc is capable of compiling C source files written in the following syntax:

  • data types: char, int, struct, and pointer
  • condition statements: if, while, for, switch, case, break, return, and general expressions
  • compound assignments: +=, -=, *=
  • global/local variable initializations for supported data types
    • e.g. int i = [expr]

The backend targets armv7hf with Linux ABI, verified on Raspberry Pi 3.

Bootstrapping

The steps to validate shecc bootstrapping:

  1. stage0: shecc source code is initially compiled using an ordinary compiler which generates a native executable. The generated compiler can be used as a cross-compiler.
  2. stage1: The built binary reads its own source code as input and generates an ARMv7-A/RV32IM binary.
  3. stage2: The generated ARMv7-A/RV32IM binary is invoked (via QEMU or running on Arm and RISC-V devices) with its own source code as input and generates another ARMv7-A/RV32IM binary.
  4. bootstrap: Build the stage1 and stage2 compilers, and verify that they are byte-wise identical. If so, shecc can compile its own source code and produce new versions of that same program.

Prerequisites

Code generator in shecc does not rely on external utilities. You only need ordinary C compilers such as gcc and clang. However, shecc would bootstrap itself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user emulation on GNU/Linux:

$ sudo apt-get install qemu-user

It is still possible to build shecc on macOS or Microsoft Windows. However, the second stage bootstrapping would fail due to qemu-arm absence.

Build and Verify

Configure which backend you want, shecc supports ARMv7-A and RV32IM backend:

$ make config ARCH=arm
# Target machine code switch to Arm

$ make config ARCH=riscv
# Target machine code switch to RISC-V

Run make and you should see this:

  CC+LD	out/inliner
  GEN	out/libc.inc
  CC	out/src/main.o
  LD	out/shecc
  SHECC	out/shecc-stage1.elf
  SHECC	out/shecc-stage2.elf

File out/shecc is the first stage compiler. Its usage:

shecc [-o output] [--no-libc] [--dump-ir] <infile.c>

Compiler options:

  • -o : output file name (default: out.elf)
  • --no-libc : Exclude embedded C library (default: embedded)
  • --dump-ir : Dump intermediate representation (IR)

Example:

$ out/shecc -o fib tests/fib.c
$ chmod +x fib
$ qemu-arm fib

shecc comes with unit tests. To run the tests, give "check" as an argument:

$ make check

Reference output:

...
int main(int argc, int argv) { exit(sizeof(char)); } => 1
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; case 3: a = 10; break; case 1: return 0; } exit(a); } => 10
int main(int argc, int argv) { int a; a = 0; switch (3) { case 0: return 2; default: a = 10; break; } exit(a); } => 10
OK

To clean up the generated compiler files, execute the command make clean. For resetting architecture configurations, use the command make distclean`.

Intermediate Representation

Once the option --dump-ir is passed to shecc, the intermediate representation (IR) will be generated. Take the file tests/fib.c for example. It consists of a recursive Fibonacci sequence function.

int fib(int n)
{
    if (n == 0)
        return 0;
    else if (n == 1)
        return 1;
    return fib(n - 1) + fib(n - 2);
}

Execute the following to generate IR:

$ out/shecc --dump-ir -o fib tests/fib.c

Line-by-line explanation between C source and IR:

 C Source            IR                         Explanation
-------------------+--------------------------+----------------------------------------------------

int fib(int n)      fib:                        Reserve stack frame for function fib
{                     {
    if (n == 0)         x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 0                 Set x1 to zero
                        x0 == x1 ?              Compare x0 with x1
                        if false then goto 1641 If x0 != x1, then jump to label 1641
        return 0;       x0 := 0                 Set x0 to zero. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1641:
    else if (n == 1)    x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 == x1 ?              Compare x0 with x1
                        if true then goto 1649  If x0 != x1, then jump to label 1649
        return 1;       x0 := 1                 Set x0 to 1. x0 is the return value.
                        return (from fib)       Jump to function exit
                    1649:
    return              x0 = &n                 Get address of variable n
       fib(n - 1)       x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 1                 Set x1 to 1
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 1)
       +                x0 := fib() @ 1631      Call function fib() into x0
                        push x0                 Store the result on stack
       fib(n - 2);      x0 = &n                 Get address of variable n
                        x0 = *x0 (4)            Read value from address into x0, length = 4 (int)
                        x1 := 2                 Set x1 to 2
                        x0 -= x1                Subtract x1 from x0 i.e. (n - 2)
                        x1 := fib() @ 1631      Call function fib() into x1
                        pop x0                  Retrieve the result off stack into x0
                        x0 += x1                Add x1 to x0 i.e. the result of fib(n-1) + fib(n-2)
                        return (from fib)       Jump to function exit
                      }                         Restore the previous stack frame
                      exit fib

Known Issues

  1. The generated ELF lacks of .bss and .rodata section
  2. The support of varying number of function arguments is incomplete. No <stdarg.h> can be used. Alternatively, check the implementation printf in source lib/c.c for var_arg.
  3. The C front-end is a bit dirty because there is no effective AST.

License

shecc is freely redistributable under the BSD 2 clause license. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

More Repositories

1

lkmpg

The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)
TeX
7,496
star
2

lab0-c

C Programming Lab: Assessing Your C Programming Skills
C
407
star
3

rv32emu

Compact and Efficient RISC-V RV32I[MAFC] emulator
C
393
star
4

simplefs

A simple native file system for Linux kernel
C
372
star
5

concurrent-programs

Complementary Concurrency Programs for course "Linux Kernel Internals"
C
301
star
6

jitboy

A Game Boy emulator with dynamic recompilation (JIT)
C
299
star
7

semu

A minimalist RISC-V system emulator capable of running Linux kernel
C
252
star
8

cpumemory-zhtw

Traditional Chinese translation of "What Every Programmer Should Know About Memory"
CSS
216
star
9

vwifi

A virtual wireless device driver for Linux
C
201
star
10

kvm-host

A minimalist type 2 hypervisor using Linux Kernel Virtual Machine (KVM)
C
152
star
11

pitifulvm

A shabby implementation of Java virtual machine in C
C
138
star
12

vcam

Virtual camera device driver for Linux
C
89
star
13

sehttpd

A small and efficient web server with 1K lines of C code
C
82
star
14

concurrency-primer

Concurrency Primer
TeX
75
star
15

cserv

An event-driven and non-blocking web server
C
70
star
16

concurrent-ll

concurrent linked list implementation
C
68
star
17

khttpd

An experimental HTTP server implemented as Linux kernel module
C
60
star
18

rv32emu-legacy

RISC-V RV32I[MA] emulator with ELF support
C
47
star
19

raycaster

Wolfenstein 3D-style raycasting implementation
C
44
star
20

linux-list

Linux-like doubly-linked list
C
39
star
21

fibdrv

Linux kernel module that calculates Fibonacci numbers
Shell
37
star
22

gameboy-emu

An efficient and portable Game Boy emulator
C
36
star
23

mado

A window system for resource-constrained devices
C
34
star
24

lkm-hidden

A Linux kernel module which hides itself
C
29
star
25

kecho

A lightweight echo server implementation in Linux kernel mode
C
27
star
26

rubi

Ruby-like high-performance script programming language with JIT compilation
C
25
star
27

threaded-logger

Threaded Logger
C
21
star
28

threadkit

A collection of lightweight threading utilities
C
20
star
29

vinput

A collection of virtual input device drivers for Linux
C
20
star
30

linux-cfs-sim

Simulate Linux Completely Fair Scheduler (CFS) using POSIX Threads
C
18
star
31

rnnoise

A noise suppression library based on a recurrent neural network
C
17
star
32

kcalc

Math expression evaluation as Linux kernel module
C
17
star
33

dict

Ternary Search Tree + Bloom filter
C
15
star
34

jitcalc

A simple integer calculator using JIT compilation
C
15
star
35

y86_64-tools

Y86-64 Tools: assembler, simulator, Verilog designs
C
14
star
36

fastcat

A faster "cat" implementation using splice and sendfile system calls
C
13
star
37

neocon

A simple serial console utility
C
13
star
38

bignum

An incomplete arbitrary-precision integer arithmetic library
C
13
star
39

fiber

A User Space Threading Library
C
13
star
40

compute-pi

Leibniz formula for π
C
12
star
41

ca2023-lab3

Lab3: Construct a single-cycle CPU with Chisel
Scala
12
star
42

mapreduce

A simple C Thread pool implementation
C
12
star
43

prefix-search

Implement prefix search using ternary search tree
C
12
star
44

jit-construct

JIT compiler from scratch, derived from Nick Desaulniers' great work
Lua
11
star
45

datalab

Improved CS:APP Data Lab
C
9
star
46

buddy

Buddy Memory Allocator
C
8
star
47

moxiebox

A secure, sandboxed execution mechanism that enables deterministic input, processing and output
C
8
star
48

phonebook

sample phonebook program to illustrate the impact of cache miss
Shell
8
star
49

raytracing

Small ray tracing program for performance evaluation
C
8
star
50

intrusive-ds

A collection of intrusive data-structures for C
C
8
star
51

kilo

A text editor in less than 1000 LoC with syntax highlight and search
C
8
star
52

gecos

GECOS: A lock-free synchronization mechanism
C
7
star
53

nyancat

Nyancat rendered in your terminal
C
6
star
54

matrix_oo

Sample matrix implementation illustrating object-oriented techniques in C99
Shell
6
star
55

dont-trace

A simple Linux kernel module that kills ptrace tracer and its tracees
C
6
star
56

kfifo-examples

Linux kernel module examples about kfifo
C
5
star
57

mergesort-concurrent

merge sort on singly-linked list utilzing POSIX Thread
C
5
star
58

tinymembench

Measure peak bandwidth of sequential memory accesses and the latency of random memory accesses
C
5
star
59

cirbuf

Circular Buffer implementation with mmap(2) *incomplete*
C
4
star
60

align-bench

Microbenchmark for unaligned memory access
C
4
star
61

kcalc-fixed

Math expression evaluation as Linux kernel module, fixed-point implementation
C
4
star
62

malloc-test-concurrent

concurrent malloc benchmark
C
3
star
63

prefetcher

Evaluate the effects of prefetching
Shell
3
star
64

sched-plugin

A Linux kernel module to allow user processes being handed out with LKM based scheduler
C
3
star
65

phonebook-concurrent

build a phonebook program by concurrent linked list
C
3
star
66

simrupt

A Linux device driver that simulates interrupts
C
3
star
67

tco-test

Test the ability of C compilers performing Tail Call Optimization
C
2
star
68

vsnd

Virtual Linux soundcard driver
C
2
star
69

rv32emu-demo

HTML
2
star
70

balanced-ternary

Ilustrate how balanced ternary works
Shell
2
star
71

bf-runtime

Brainf*ck runtime engine
C
1
star
72

quotient-filter

(Incomplete) in-memory quotient filter
C
1
star
73

ksort

Linux kernel module that implements and validates sorting algorithms
C
1
star
74

arm-assembler-latex-listings

Arm Assembler language definition for the LaTeX listings package
TeX
1
star
75

clz-tests

Evaluate implementations of count leading zero
C
1
star