RSM is a virtual computer, a form of virtual machine.
- simple RISC instruction set with plenty of general-purpose registers.
- virtual memory means programs have no access to host memory addresses and all memory accesses are checked. It also means all RSM programs have the same address space (currently 48 bits; 256 TiB) and could even be migrated across hosts at runtime.
- embraces the linear memory model -- addressing memory is a simple and easy to-understand way to deal with data. Prefer memory-mapped I/O over custom system calls.
- multi-threaded Go-style task scheduler
- portable and embeddable, without dependencies. (RSM can even build without libc.)
- entire API in one header: src/rsm.h
- simple explicit assembly language
- compiles to very compact, self-contained "ROM" files
- RSM comes with an integrated assembler, making it possible to compile & run code at runtime. (It can be disabled with a macro if desired in embedding scenarios.)
- includes an AST API for code generation without an intermediate assembly step, useful if you want to make a compiler that targets RSM.
Project goals:
- learn, have fun -- simplicity
- substrate, a thing to make other thing on
- longevity -- I want to be able to run a (multimedia) program in 10+ years
"RSM" initially stands for "rsms's smol machine" but can also be interpreted as "Really smol machine", or "Raggedy-ass special mumbojumbo", or the recursive acronym "RSM smol machine" (in case you miss the golden days of PHP), or anything you'd like it to mean! Your imagination is really the limit here my friend.
Status of this project: This is a passion project and thus is not "production grade" stuff. The instruction set and semantics are changing. I'd be thrilled and happy if you play with RSM and build stuff on it, but please do keep in mind that stuff will change. Contributions are welcome after an initial discussion
Work in progress, TODO:
- Finish the scheduler
- Framebuffer for drawing graphics (maybe even webgpu?)
- Audio
- Lots of little small pieces here and there marked with "TODO" in the code
Try it
Download a binary for your platform and try this:
$ cat > hello.rsm <<END
fun main(i32) {
const STDOUT = 1
data message = "Hello world\n"
R0 = message // address of string
R1 = 12 // length of string
R0 = write R0 R1 STDOUT
}
END
$ rsm hello.rsm
Hello world
$
Building & running
$ ./build.sh -debug
$ ./out/debug/rsm -d -R0=15 examples/factorial.rsm
# R0 will contain the result 1307674368000
You'll need the following things to build rsm:
- bash (or a bash-compatible shell like zsh)
- ninja (or a ninja-compatible program like samurai)
- C11 compiler with libc (e.g. clang or GCC)
You can use rsm
as a really awkward calculator:
$ echo 'fun x() { R0 = R0 * 2; ret; }' | out/debug/rsm -d -R0=123
# R0 will contain the result 246
RSM assembly can be compiled into a ROM file which can later be executed:
$ echo 'fun x() { R0 = R0 * 2; ret; }' | out/debug/rsm -o multiply.rom
$ out/debug/rsm -d -R0=123 multiply.rom
# R0 will contain the result 246
Example
$ cat <<EXAMPLE > example.rsm
fun factorial(i32) i32 {
R1 = R0 // ACC = n (argument 0)
R0 = 1 // RES (return value 0)
ifz R1 end // if n==0 goto end
b1: // <- [b0] b1
R0 = R1 * R0 // RES = ACC * RES
R1 = R1 - 1 // ACC = ACC - 1
if R1 b1 // if n!=0 goto b1
end: // <- b0 [b1]
ret // RES is at R0
}
EXAMPLE
$ out/debug/rsm -d -R0=15 example.rsm
# R0 will contain the result 1307674368000
See the examples/
directory for more.
Instruction Set Architecture
Instructions are fixed-size, 32 bits wide, little endian.
PC and jump- & branch destinations are expressed in #instructions rather than bytes.
There is room for 256 operations and 32+32 (int+fp) registers (8 bit OP, 5 bit reg)
Most instructions accept reg or immediate (i
bit is set) as last argument
βββββββββββββββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬ββ¬ββββββββββββββββ
bit β3 3 2 2 2 2 2 2β2 2 2 2 1β1 1 1 1 1β1 1 1 1 β β β
β1 0 9 8 7 6 5 4β3 2 1 0 9β8 7 6 5 4β3 2 1 0 9β8β7 6 5 4 3 2 1 0β
βββββββββββββββββΌββββββββββΌββββββββββΌββββββββββΌββΌββββββββββββββββ€
ABCD β D (8) β C (5) β B (5) β A (5) βiβ OP (8) β
βββββββββββββββββ΄ββββββββββΌββββββββββΌββββββββββΌββΌββββββββββββββββ€
ABCw β C (13) β B (5) β A (5) βiβ OP (8) β
βββββββββββββββββββββββββββ΄ββββββββββΌββββββββββΌββΌββββββββββββββββ€
ABw β B (18) β A (5) βiβ OP (8) β
βββββββββββββββββββββββββββββββββββββ΄ββββββββββΌββΌββββββββββββββββ€
Aw β A (23) βiβ OP (8) β
βββββββββββββββββββββββββββββββββββββββββββββββ΄ββ΄ββββββββββββββββ
Registers:
- 30 general-purpose integer registers R0β¦R29
- 30 general-purpose floating-point registers F0β¦F29
- Context register CTX (R30)
- Stack pointer SP (R31)
- Floating-point status FPSR (F31)
- TODO: is a fp control reg needed for stuff like 0div traps? No... no.
Calling convention
- first 8 integer argument/return values in R0β¦R7, rest on stack
- first 8 F.P. argument/return values in F0β¦F7, rest on stack
- anything larger than the register size goes on stack
- caller saves R0β¦R18, F0β¦F18 (owned by callee)
- callee saves R19β¦R29, F19β¦F29 (owned by caller)
- convention inspired by AAPCS64
Callee-owned registers
Callee-owned (caller-saved, temporary) registers. Caller needs to save these before a call (if caller uses them.) Callee can freely use these registers.
R0β¦R7 1stβ¦8th integer argument/return value
F0β¦F7 1stβ¦8th floating-point argument/return value
R8β¦R18 General purpose
F8β¦F18 General purpose
Caller-owned registers
Caller-owned (callee-saved, long-lived) registers. Caller does not need to save these registers. Callee using these must save and later restore their values before returning.
R19β¦R29 General purpose
F19β¦F29 General purpose
SP (R31) Stack pointer
Special registers
CTX (R30) Context (like AAPCS platform reg and Go's G)
SP (R31) Stack pointer
- (F30) Reserved (unused)
FPSR (F31) Floating-point status
Assembly language
Syntax
White-space is ignored
file = (fundef | constdef | datadef)*
constdef = "const" name type? "=" expr ";"
datadef = "data" name (type ("=" expr)? | "=" expr) ";"
fundef = "fun" name "(" params? ")" result? funbody?
params = param ("," param)*
result = param ("," param)*
param = name type | type
funbody = "{" block0? block* "}"
block0 = blockstmt*
block = name ":" blockstmt*
blockstmt = operation | assignment | binop | constdef | datadef
type = inttype | arraytype
inttype = "i1" | "i8" | "i16" | "i32" | "i64"
arraytype = type "[" intlit "]"
operation = opcode operand*
; brz R1 end
binop = operand ("-" | "+" | "*" | "/") operand
; x + 3
assignment = reg "=" (operation | operand) ";"
operand = reg | literal | name
literal = intlit
intlit = "-"? (binlit | declit | hexlit)
binlit = "0b" ("0" | "1")+
declit = (0-9)+
hexlit = "0x" (0-9A-Fa-f)+
name = ("_" | A-Za-z | uniprint) ("_" | A-Za-z | 0-9 | uniprint)
uniprint = <utf8 encoding of printable unicode codepoint>
opcode = copy | copyv
| load | load4u | load4s | load2u | load2s | load1u | load1s
| store | store4 | store2 | store1
| push | pop
| add | sub | mul | adds | subs | muls
| div | mod
| and | or | xor | shl | shrs | shru | binv
| not
| eq | neq | ltu | lts | lteu | ltes | gtu | gts | gteu | gtes
| if | ifz | call | jump | ret
| mcopy | mcmp
| write | read
Comments are ignored and can appear wherever whitespace can appear
comment = linecomment | blockcomment
linecomment = "//" <any character except LF> <LF>
blockcomment = "/*" <any character> "*/"