• Stars
    star
    452
  • Rank 93,319 (Top 2 %)
  • Language
    C
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple virtual machine which interprets bytecode.

simple.vm

This repository contains the implementation for a simple virtual-machine, along with a driver which will read a program from a file and execute it via that virtual machine.

In addition to the virtual machine interpreter you'll also find:

  • A simple compiler, written in perl.
    • This will translate from assembly-source into binary-opcodes.
  • A simple decompiler, written in perl.
    • This will translate in the other direction.
  • Several example programs written in our custom assembly-language.
  • An example of embedding the virtual machine in a C host program.
    • Along with the definition of a custom-opcode handler.

This particular virtual machine is intentionally simple, but despite that it is hopefully implemented in a readable fashion. ("Simplicity" here means that we support only a small number of instructions, and the registers the virtual CPU possesses can store strings and integers, but not floating-point values.) This particular virtual machine is register-based, having ten registers which can be used to store strings or integer values.

Compilation

Because the compiler and decompiler are written in Perl they need no special treatment.

The interpretter, written in C, can be built like so:

$ make

This will generate simple-vm and embedded from the contents of src/.

Implementation Notes

Implementing a basic virtual machine, like this one, is a pretty well-understood problem:

  • Load the bytecode that is to be interpreted into a buffer.
  • Fetch an instruction from from the buffer.
    • Decode the instruction, usually via a long case statement.
    • Advance the index such that the next instruction is ready to be interpreted.
    • Continue until you hit a halt, or exit instruction.
  • Cleanup.

The main complications of writing a virtual machine are:

  • Writing a compiler to convert readable source into the binary opcodes which will be actually executed.
    • Users want to write "goto repeat" not "0x10 0x03 0x00".
  • Writing a disassembler for debugging purposes.
  • Implementing a useful range of operations/opcodes.
    • For example "add", "dec", "inc", etc.

This particular virtual machine contains only a few primitives, but it does include the support for labels, looping, conditional jumps, calls and returns. There is also a stack which can be used for storing temporary values, and used for call/ret handling.

The handling of labels in this implementation is perhaps worthy of note, because many simple/demonstration virtual machines don't handle them at all.

In order to support jumping to labels which haven't necessarily been defined yet our compiler keeps a running list of all labels (i.e. possible jump destinations) and when it encounters a jump instruction, or something else that refers to a label, it outputs a placeholder-address, such as:

  • JUMP 0x000
    • In our bytecode that is the three-byte sequence 0x10 0x00 0x00 as the JUMP instruction is defined as 0x10.

After compilation is complete all the targets should have been discovered and the compiler is free to update the generated bytecodes to fill in the appropriate destinations.

NOTE: In our virtual machines all jumps are absolute, so they might look like "JUMP 0x0123" rather than "JUMP -4" or "JUMP +30".

NOTE: The same thing applies for other instructions which handle labels, such as storing the address of a label, making a call, etc.

Embedding

This virtual machine is designed primarily as a learning experience, but it is built with the idea of embedding in mind.

The standard simple-vm binary, which will read opcodes from a file and interpret them, is less than 40k in size.

Because the processing of binary opcodes is handled via a dispatch-table it is trivially possible for you to add your own application-specific opcodes to the system which would allow you to execute tiny compiled, and efficient, programs which can call back into your application when they wish.

There is an example of defining a custom opcode in the file src/embedded.c. This example defines a custom opcode 0xCD, and executes a small program which uses that opcode for demonstration purposes:

 $ ./embedded
 [stdout] Register R01 => 16962 [Hex:4242]
 Custom Handling Here
     Our bytecode is 8 bytes long

Instructions

There are several instruction-types implemented:

  • Storing string/int values into a given register.
  • Mathematical operations:
    • add, and, sub, multiply, divide, incr, dec, or, & xor.
  • Output the contents of a given register. (string/int).
  • Jumping instructions.
    • Conditional and unconditional
  • Comparison of register contents.
    • Against registers, or string/int constants.
  • String to integer conversion, and vice-versa.
  • Stack operations
    • PUSH/POP/CALL/RETURN

The instructions are pretty basic, as this is just a toy, but adding new ones isn't difficult and the available primitives are reasonably useful as-is.

The following are examples of all instructions:

:test
:label
goto  0x44ff      # Jump to the given address
goto  label       # Jump to the given label
jmpnz label       # Jump to label if Zero-Flag is not set
jmpz  label       # Jump to label if Zero-Flag is set

store #1, 33      # store 33 in register 1
store #2, "Steve" # Store the string "Steve" in register 1.
store #1, #3      # register1 is set to the contents of register #3.

exit              # Stop processing.
nop               # Do nothing

print_int #3      # Print the (integer) contents of register 3
print_str #3      # Print the (string) contents of register 3

system #3         # Call the (string) command stored in register 3

add #1, #2, #3    # Add register 2 + register 3 contents, store in reg 1
sub #1, #2, #3    # sub register 2 + register 3 contents, store in reg 1
mul #1, #2, #3    # multiply register 2 + register 3 contents, store in reg 1
concat #1, #2,#3  # store concatenated strings from reg2 + reg3 in reg1.

dec #2            # Decrement the integer in register 2
inc #2            # Increment the integer in register 2

string2int #3     # Change register 3 to have a string from an int
is_integer #3     # Does the given register have an integer content?
int2string #3     # Change register 3 to have an int from a string
is_string  #3     # Does the given register have a string-content?

cmp #3, #4        # Compare contents of reg 3 & 4, set the Z-flag.
cmp #3, 42        # Compare contents of reg 3 with the constant 42.  sets z.
cmp #3, "Moi"     # Compare contents of reg 3 with the constant string "Moi".  sets z.

peek #1, #4       # Load register 1 with the contents of the address in #4.
poke #1, #4       # Set the address stored in register4 with the contents of reg1.
random #2         # Store a random integer in register #2.

push #1           # Store the contents of register #1 in the stack
pop  #1           # Load register #1 with the contents of the stack.
call 0xFFEE       # Call the given address.
call my_label     # Call the defined label
ret               # Return from a called-routine.

Simple Example

The following program will just endlessly output a string:

 :start
      store #1, "I like loops"
      print_str #1
      goto start

This program first stores the string "I like loops" in register 1, then prints that register, before jumping back to the start of the program.

To actually compile this program into bytecode run:

  $ ./compiler ./examples/simple.in

This will produce an output file full of binary-opcodes in the file simple.raw:

  $ od -t x1 -An ./examples/simple.raw
  01 01 0c 49 20 6c 69 6b 65 20 6c 6f 6f 70 73 03
  01 06 00 00

Now we can execute that series of opcodes:

  ./simple-vm ./examples/simple.raw

If you wish to debug the execution then run:

  DEBUG=1 ./simple-vm ./examples/simple.raw

There are more examples stored beneath the examples/ subdirectory in this repository. The file examples/quine.in provides a good example of various features - it outputs its own opcodes.

Fuzz Testing

If you wish to fuzz-test with afl you should find that pretty straight-forward:

  • Install afl itself, as per the instructions.
  • Compile this project, setting CC=afl-gcc in the Makefile.
  • Generate some sample-programs, as starting point, then run the fuzzer.

You can compile each of our bundled samples like so:

 cd examples/
 for i in *.in; do ../compiler $i; done
 cd ..

This will compile examples/*.in into examples/*.raw. Place those in a directory, and start your fuzzer:

 mkdir samples/
 cp examples/*.raw samples/

Now you have ./samples/ containing only compiled programs. You can then mutate/permute those examples under the control of the fuzzer with a command-line like this:

 export FUZZ=1
 afl-fuzz -i ./samples/ -o results/ ./simple-vm @@ 16384

NOTE: Here we cap each run to executing no more than 16384 instructions. This will ensure that programs don't have infinite loops in them.

We set the environmental variable FUZZ to contain 1 solely to disable the use of the system() function. Which might accidentally remove your home-directory, format your drive, or send me a donation!

See Also

A golang port of the virtual-machine compiler and interpreter is available from the following repository:

In addition to that you can find a real virtual-machine inside the embedded scripting engine I wrote, also for golang. In that case a scripting language is parsed and converted into a series of bytecode instructions, which are executed by a virtual machine. Similar to this project, but the bytecode operations are more complex and high-level:

If you're interested in compilers, and interpreters, generally you might enjoy these other projects too:

Steve

More Repositories

1

sysadmin-util

Tools for Linux/Unix sysadmins.
Perl
940
star
2

bookmarks.public

A template for self-hosted bookmarks using HTML & jQuery.
JavaScript
660
star
3

tunneller

Allow internal services, running on localhost, to be accessed over the internet..
Go
457
star
4

deployr

A simple golang application to automate the deployment of software releases.
Go
323
star
5

gobasic

A BASIC interpreter written in golang.
Go
311
star
6

go.vm

A simple virtual machine - compiler & interpreter - written in golang
Go
309
star
7

simple-vpn

A simple VPN allowing mesh-like communication between nodes, over websockets
Go
276
star
8

monkey

An interpreted language written in Go
Go
248
star
9

sysbox

sysadmin/scripting utilities, distributed as a single binary
Go
205
star
10

esp8266

Collection of projects for the WeMos Mini D1
C++
162
star
11

kilua

A minimal text-editor with lua scripting.
C++
158
star
12

sos

Simple Object Storage (I wish I could call it Steve's Simple Storage, or S3 ;)
Go
145
star
13

github-action-publish-binaries

Publish binaries when new releases are made.
Shell
134
star
14

evalfilter

A bytecode-based virtual machine to implement scripting/filtering support in your golang project.
Go
110
star
15

rss2email

Convert RSS feeds to emails
Go
104
star
16

e-comments

External comments for static HTML pages, a lightweight self-hosted disqus alternative.
JavaScript
101
star
17

linux-security-modules

A place to store my toy linux-security modules.
C
87
star
18

marionette

Something like puppet, for the localhost only.
Go
84
star
19

kpie

Simple devilspie-like program for window manipulation, with Lua.
C
77
star
20

dhcp.io

Dynamic DNS - Via Redis, Perl, and Amazon Route53.
Perl
69
star
21

foth

Tutorial-style FORTH implementation written in golang
Go
67
star
22

overseer

A golang-based remote protocol tester for testing sites & service availability
Go
62
star
23

templer

A modular extensible static-site-generator written in perl.
Perl
62
star
24

math-compiler

A simple intel/AMD64 assembly-language compiler for mathematical operations
Go
58
star
25

assembler

Basic X86-64 assembler, written in golang
Go
56
star
26

lighthouse-of-doom

A simple text-based adventure game
C
56
star
27

node-reverse-proxy.js

A reverse HTTP-proxy in node.js
JavaScript
53
star
28

webmail

A golang webmail server.
Go
51
star
29

dotfiles

Yet another dotfile-repository
Emacs Lisp
50
star
30

github2mr

Export all your github repositories to a form suitable for 'myrepos' to work with.
Go
46
star
31

puppet-summary

The Puppet Summary is a web interface providing reporting features for Puppet, it replaces the Puppet Dashboard project
Go
45
star
32

org-worklog

A template for maintaining a work-log, via org-mode.
39
star
33

tweaked.io

The code behind http://tweaked.io/
JavaScript
36
star
34

rss2hook

POST to webhook(s) when new feed-items appear.
Go
35
star
35

pam_pwnd

A PAM module to test passwords against previous leaks at haveibeenpwned.com
C
34
star
36

alphavet

A golang linter to detect functions not in alphabetical order
Go
32
star
37

dns-api-go

The code behind https://dns-api.org/
Go
31
star
38

critical

A simple/minimal TCL interpreter, written in golang
Go
31
star
39

markdownshare.com

The code which was previously used at http://markdownshare.com/
Perl
29
star
40

github-action-tester

Run tests when pull-requests are opened, or commits pushed.
Shell
26
star
41

maildir-tools

Golang-based utility which can be used for scripting Maildir things, and also as a basic email client
Go
22
star
42

chronicle2

Chronicle is a simple blog compiler, written in Perl with minimal dependencies.
Perl
20
star
43

purppura

A server for receiving and processing alerts & events.
Go
20
star
44

implant

Simple utility for embedding files/resources inside golang binaries
Go
20
star
45

dns-api.org

The code which was previously used at https://dns-api.org/
Perl
19
star
46

bfcc

BrainFuck Compiler Challenge
Go
18
star
47

ephemeris

A static blog-compiler
Go
15
star
48

markdownshare

The code behind https://markdownshare.com/
Go
15
star
49

z80-examples

Z80 assembly-language programs.
Makefile
15
star
50

yal

Yet another lisp interpreter
Go
14
star
51

aws-utils

A small collection of AWS utilities, packaged as a single standalone binary.
Go
14
star
52

z80retroshield

Arduino library for driving the Z80 retro-shield.
Shell
12
star
53

Device-Osram-Lightify

Interface to the Osram Lightify system
Perl
12
star
54

github-action-build

Build a project, creating artifacts
Shell
12
star
55

webserver-attacks

Identify attacks against webservers via simple rules
Perl
12
star
56

predis

A redis-server written in Perl.
Perl
11
star
57

da-serverspec

ServerSpec.org configuration for the Debian-Administration cluster.
Ruby
10
star
58

docker-api-gateway

Trivial API-gateway for docker, via HAProxy
Go
10
star
59

http2xmpp

HTTP to XMPP (jabber) bridge.
Perl
9
star
60

nanoexec

Trigger commands over a nanomsg queue
C
9
star
61

go-experiments

Repository containing experiments as I learn about golang
Go
9
star
62

labeller

Script label addition/removal for gmail/gsuite email.
Go
8
star
63

golang-metrics

Automatic submission of system metrics to graphite, for golang applications
Go
8
star
64

ms-lite

A collection of plugins for a qpsmtpd-powered virtual-host aware SMTP system.
Perl
8
star
65

remotehttp

Magic wrapper to deny HTTP-requests to to "local" resources.
Go
8
star
66

dashboard

Redis & node.js powered dashboard skeleton
JavaScript
8
star
67

Buffalo-220-NAS

Installing NFS on a Buffalo 220 NAS device
Shell
8
star
68

asql

A toy utility to process Apache log files via SQL.
Perl
7
star
69

DockerFiles

Container for various dockerfiles.
Shell
6
star
70

yawns

Yet another weblog/news site
Perl
6
star
71

cidr_match.js

A simple module to test whether a given IPv4 address is within a particular CIDR range.
JavaScript
6
star
72

pass

password-store distribution, with plugins.
Shell
6
star
73

knownfs

A FUSE-based filesystem that exports ~/.ssh/known_hosts
Go
6
star
74

mpd-web

Simple HTTP view of an MPD server
Go
6
star
75

mod_writable

Disallow serving writable files under Apache 2.4.x
C
5
star
76

org-diary

Easily maintain a simple work-log / journal with the use of org-mode
Emacs Lisp
5
star
77

mod_blacklist

A simple Apache module to blacklist remote hosts.
C
5
star
78

arduino-mega-z80-simplest

The simplest possible project combining an Arduino Mega and a Zilog Z80 processor
C++
4
star
79

turtle

A simple turtle-implementation, using FORTH as a scripting-language
Go
4
star
80

purple

A simplified version of mauvealert
Perl
3
star
81

subcommands

Easy subcommand handling for a golang based command-line application
Go
3
star
82

thyme

A simple package-building system, using docker
Perl
2
star
83

httpd

Simple golang HTTP server
Go
2
star
84

edinburgh.io

Open pub database
JavaScript
2
star
85

run-directory

A simple application inspired by `run-parts`.
Go
2
star
86

Redis--SQLite

Redis-Compatible module which writes to SQLite
Perl
2
star
87

runme

A quick hack for running commands from README.md files
Go
2
star
88

devopswithdocker.com

Repository created for the Helsinki University course.
Dockerfile
2
star
89

aws-list

Export a dump of all running EC2 instances, along with AMI details, AMI age, etc, etc.
1
star
90

calibre-plugins

A small collection of calibre-plugins.
Python
1
star
91

WebService--Amazon--Route53--Caching

Perl module to cache the results of WebService::Amazon::Route53
Perl
1
star
92

lexing-parsing-linting-stuffs

Code to go with my talk
Python
1
star
93

Test--RemoteServer

The Perl module Test::RemoteServer
Perl
1
star
94

org-tag-cloud

Easily maintain a tag-cloud of org-mode tags.
Emacs Lisp
1
star
95

headerfile

Parse files with simple key:value headers, easily.
Go
1
star
96

z80-cpm-scripting-interpreter

A trivial I/O language, with repl, written in z80 assembler to run under CP/M.
Makefile
1
star