WASM 1.1 compiler / decompiler
A novel Clojure/script library for the WebAssembly (WASM) ecosystem:
- WASM programs as simple immutable Clojure data structures
- Decompiling and compiling WASM binaries
- JVM and browser, no compromise
- Allowing all sorts of crazy WASM analysis and metaprogramming
- Working interactively as opposed to using the command-line
- Generating random WASM programs for runtime testing
- Fully described using Malli
- Robust, backed up by generative testing
Supported platforms:
- Babashka >= v0.3.5 (besides
helins.wasm.schema
namespace) - Browser
- JVM
- NodeJS
All binary processing relies on the powerful BinF library.
Status
The implementation of this library follows the WASM specification very closely.
Even the order of the definitions in the namespaces is identical to the order of the definitions in the WASM binary specification so that they can be read alongside. Indeed, there is no better documentation than the specification itself.
This design also reduces the chance of creating breaking changes. However, WASM development is very active and new proposals are being built. There should never be a breaking change within WASM itself. However, since this library is novel, current status is alpha for the time bing in spite of the fact that the design is robust and well-tested.
The goal is to remain up-to-date with all stable WASM proposals.
Documentation
The full API is available on Cljdoc.
Namespaces follow one naming scheme. In the WASM binary specification, any item is defined by a so-called "non-terminal symbol". For instance, the function section is designated by funcsec.
Names that refer to those non-terminal symbols end with the '
character. For
instance, the helins.wasm.read
namespace for decompiling WASM code has a
funcsec'
function which decompiles a function section. Those names do not have
docstrings in Cljdoc since it is best to read and follow the WASM specification.
Namespaces mimick exactly that specification for that reason.
All other names, such as higher-level abstractions, are fully described on Cljdoc.
Examples
Working examples are available in the helins.wasm.example namespace.
Usage, brief overview
Compilation / decompilation is easy as demonstrated in the example below.
The rest of the document is about understanding and modifying a decompiled program.
In very, very short:
(require 'clojure.pprint
'[helins.wasm :as wasm])
;; Reading an example file from this repo:
;;
(def decompiled
(wasm/decompile-file "src/wasm/test.wasm"))
;; Pretty printing decompiled form (Clojure data):
;;
(clojure.pprint/pprint decompiled)
;; Of course, we can recompile it:
;;
(def compiled
(wasm/compile-file decompiled
"/tmp/test2.wasm"))
Working with files is the only JVM-exclusive utility in this library.
WASM binaries are represented as BinF views. For instance, from Clojurescript:
(require '[helins.binf :as binf])
;; Storage for our decompiled WASM program
;;
(def *decompiled
(atom nil))
;; Fetching and decompiling WASM source from somewhere
;;
(-> (js/fetch "some_url/some_module.wasm")
(.then (fn [resp]
(.arrayBuffer resp)))
(.then (fn [array-buffer]
(reset! *decompiled
(-> array-buffer
;; Wrapping buffer in a BinF view and preparing it (will set the right endianess)
binf/view
wasm/prepare-view
;; Decompiling
wasm/decompile)))))
;; And later, we can just as well recompile it to a BinF view
;;
(def compiled
(wasm/compile @*decompiled))
Installation
After adding this library to dependencies, one must also manually add Malli. As of today, an unreleased version is needed:
{metosin/malli {:git/url "https://github.com/metosin/malli"
:sha "0e5e3f1ee9bc8d6ea60dc16e59abf9cc295ab510"}}
The imported version (lastest release), does not support generation of instructions (and hence, modules).
Namespaces
In summary:
Namespace | About |
---|---|
helins.wasm | Compiling and decompiling WASM modules |
helins.wasm.bin | Defines all simple binary values such as opcodes |
helins.wasm.ir | Simple manipulations of WASM programs in Clojure |
helins.wasm.read | Implementing decompilation (for "experts") |
helins.wasm.schema | Using Malli, describes the WASM binary format in Clojure |
helins.wasm.write | Implementing compilation (for "experts") |
Schema
The Clojure data structures representing WASM programs are almost a direct translation of the WASM binary specification. Very little abstraction has been added on purpose. The goal is to leverage those wonderful data structures while having the illusion of working directly with the binary representation.
The registry of Malli schemas describes everything:
(require '[helins.wasm.schema :as wasm.schema]
'[malli.core :as malli]
'[malli.generator :as malli.gen]
'[malli.util])
;; Merging all needed registries.
;;
(def registry
(merge (malli/default-schemas)
(malli.util/schemas)
(wasm.schema/registry)))
;; What is a `funcsec`?
;;
(get registry
:wasm/funcsec)
;; Let us generate a random WASM program.
;;
(malli.gen/generate :wasm/module
{:registry registry})
Overall shape
A WASM program is a map referred in the namespaces as a ctx
(context). It
holds the program itself (WASM sections) as well as a few extra things
(akin to the context described in other sections of the WASM specification).
Almost everything is a map but WASM instructions which are vectors. All simples values, such as opcodes, remain as binary values (see "Instructions" section for an example).
Sections
In the binary format, most WASM sections format are essentially a list of items, such as the data section being a list of data segments. Other parts of the program, such as instructions operating on such a data segment, refer to an item by its index in that list.
Howewer, working with lists of items and addressing those items by index is hard work,
especially maintaining those references when things are removed, added, and
move around. Hence, those sections are described by sorted maps of index
->
item
. They can be sparse and during compilation, indices (references) will be
transparently recomputed into a dense list.
See helins.wasm.ir
namespace for a few functions showing how to handle things
like adding a data segment.
Instructions
Instructions are expressed as vectors where the first item is an opcode and the rest might be so-called "immediates" (ie. mandatory arguments). Once again, they look almost exactly like the binary format and the official specification is the best documentation.
For example, here is a WASM block
which adds 42 to a value from the WASM stack:
(require '[helins.wasm.bin :as wasm.bin])
[wasm.bin/block
nil
[wasm.bin/i32-const 42]
[wasm.bin/i32-add]]
Modifying a WASM program
Since everything is described in the helins.wasm.schema
namespace and since
those definitions are well documented in the WASM binary specification, it is
fairly easy to create or modify WASM programs. Once one understands the format,
it is just common Clojure programming without much surprise.
The helins.wasm.ir
namespace ("ir" standing for "Intermediary
Representation"), proposes a few utilities for doing basic things such as adding
a function. It is not very well featured because usually, doing almost anything
is very straightforward and do not require special helpers.
Novel WASM tools
The vast majority of existing WASM tools are implemented in Rust or C++. Doing things such as dead code elimination of WASM if a tedious process performed from the command-line. Building new tools in that ecosystem means abiding by that fact and working excusively with those native languages.
Hence, this library is one of its kind by offering a powerful interactive environment, on the JVM as well as in the browser, and leveraging Clojure idioms which are excellent for analyzing WASM code.
Babashka
Currently, Babashka does not support
Malli. Hence, the helins.wasm.schema
namespace is not supported. However,
compilation, decompilation, and everything else work.
This very simple script shows how to decompile a WASM file in the terminal using barely a few lines.
This opens the possibility for quickly developing WASM dev tools that start up fast and, for instance, output some structural information about given binaries.
Running tests
Depending on hardware, tests usually takes a few minutes to run.
On the JVM, using Kaocha:
$ ./bin/test/jvm/run
On NodeJS, using Shadow-CLJS:
$ ./bin/test/node/run
# Or testing an advanced build:
$ ./bin/test/node/advanced
Development
Starting in Clojure JVM mode, mentioning an additional Deps alias (here, a local setup of NREPL):
$ ./bin/dev/clojure :nrepl
Starting in CLJS mode using Shadow-CLJS:
$ ./bin/dev/cljs
# Then open ./cljs/index.html
License
Copyright Β© 2021 Adam Helinski
Licensed under the term of the Mozilla Public License 2.0, see LICENSE.