BinF stands for "Binary Formats"
Clojure/script library for handling any kind of binary format, protocol ; both in-memory and during IO ; and helping interacting with native libraries and WebAssembly modules.
An authentic Swiss army knife providing:
- Reading, writing, and copying binary data
- Via protocols which enhance host classes (
js/DataView
in JS,ByteBuffer
on the JVM, ...) - Coercions between primitive types
- Cross-platform handling of 64-bit integers
- Excellent support for IO and even memory-mapped files on the JVM
- Extra utilities such as Base64 encoding/decoding, LEB128, ...
- Defining C-like composite types (structs, unions, ...) as EDN
Supported platforms:
- Babashka (besides
helins.binf.native
namespace) - Browser
- JVM
- NodeJS
Rationale
Clojure libraries for handling binary data are typically limited and not very well maintained. BinF is the only library providing a seamless experience between Clojure and Clojurescript for pretty much any use case with an extensive set of tools built with low-level performance in mind. While in beta, it has already been used in production and for involving projects such as a WebAssembly decompiler/compiler.
Examples
All examples from the "Usage" section as well as more complete ones are in the ./src/example/helins/binf directory. They are well-described and meant to be tried out at the REPL.
Also, the helins.binf.dev namespace requires all namespaces of this library (quite a few) and can be used for REPLing around.
Cloning this repo is a fast way of trying things out. See the "Development and testing" section.
Usage
This is an overview.
After getting a sense of the library, it is best to try out full examples and explore the full API which describes more namespaces.
Let us require the main namespaces used in this document:
(require '[helins.binf :as binf]
'[helins.binf.buffer :as binf.buffer])
Buffers and views
BinF is highly versatile because it leverages what the host offers, following the Clojure mindset. The following main concepts must be understood.
A view is an object encompassing a raw chunk of memory and offering utilities for manipulating it: reading and/or writing binary data. Such a chunk of memory could be a byte array or a file. It does not really matter since views abstract those chunks.
More precisely, a view is anything that implements at least some of the protocols
defined in the helins.binf.protocol
namespace. Only rarely will the user
implement anything since BinF already enhances common classes.
On the JVM, those protocols are implemented for the ubiquitous ByteBuffer which is used pretty much everywhere. In JS, they enhance the just-as-ubiquitous DataView.
By enhancing these host classes, code can be reused for many contexts: handling memory, handling a file, a socket, ...
Finally, by definition, a buffer is an opaque byte array which can be manipulated only via a view. It represents the lowest-level of directly accessible memory a host can provide. On the JVM, a buffer is a plain old byte array. In JS, it is an ArrayBuffer or optionally a SharedArrayBuffer.
Many host utilities expect buffers hence it is important to define a coherent story between buffers and views.
Binary data and operations
Types and related operations follow a predictable naming convention.
The following table summarizes primitive binary types and their names:
Type | Description |
---|---|
buffer | Byte array |
f32 | 32-bit float |
f64 | 64-bit float |
i8 | Signed 8-bit integer |
i16 | Signed 16-bit integer |
i32 | Signed 32-bit integer |
i64 | Signed 64-bit integer |
string | String (UTF-8 by default) |
u8 | Unsigned 8-bit integer |
u16 | Unsigned 16-bit integer |
u32 | Unsigned 32-bit integer |
u64 | Unsigned 64-bit integer |
Reading and writing revolve around these types and happen at a specific position in a view. In absolute operations, that position is provided by the user explicitly. In relative operations, views use an internal position they maintain themselves.
It is much more common to use relative operations since it is more common to read or write things in a sequence. For instance, writing a 32-bit integer will then advance that internal position by 4 bytes.
When writing integers, sign do not matter. For instance, instead of specifying
i32
or u32
, b32
is used since only the bit pattern matters.
These operations are gathered in the core helins.binf
namespace. Some examples showing the naming convention are:
Operation | Description |
---|---|
wa-b32 | Write a 32-bit integer at an absolute position |
rr-i64 | Read a signed 64-bit integer from the current relative position |
wr-buffer | Copy the given buffer to the current relative position of the view |
ra-string | Read a string from an absolute position |
The first letter denotes r
eading or w
riting, the second letter denotes
a
bsolute or r
elative.
It is best to follow that naming convention when writing custom functions.
For instance, writing and reading a YYYY/mm/dd
date "relatively":
(defn wr-date
[view year month day]
(-> view
(binf/wr-b16 year)
(binf/wr-b8 month)
(binf/wr-b8 day)))
(defn rr-date
[view]
[(binf/rr-u16 view)
(binf/rr-u8 view)
(binf/rr-u8 view)])
Creating a view from a buffer
Complete example in the helins.binf.example namespace.
;; Allocating a buffer of 1024 bytes
;;
(def my-buffer
(binf.buffer/alloc 1024))
;; Wrapping the buffer in view
;;
(def my-view
(binf/view my-buffer))
;; The buffer can always be extracted from its view
;;
(binf/backing-buffer my-view)
Using our date functions defined in the previous section:
;; From the current position (0 for a new view)
;;
(let [position-date (binf/position my-view)]
(-> my-view
(wr-date 2021
3
16)
(binf/seek position-date)
rr-date))
;; => [2021 3 16]
Creating a view over a memory-mapped file (JVM)
Complete example in the helins.binf.example.mmap-file namespace.
On the JVM, BinF protocols already extends the popular ByteBuffer
used
extensively by many utilities, amongst them IO ones (about anything in
java.nio
).
One notable mention is the child class
MappedByteBuffer,
a special type of ByteBuffer
which memory-maps a file. This technique usually
results in fast and efficient IO for larger file while being easy to follow.
Our date functions used in the previous section be applied to such a memory-mapped file without any change.
There are a few ways for obtaining a MappedByteBuffer
, here is one example:
(import 'java.io.RandomAccessFile
'java.nio.channels.FileChannel$MapMode)
(with-open [file (RandomAccessFile. "/tmp/binf-example.dat"
"rw")]
(let [view (-> file
.getChannel
(.map FileChannel$MapMode/READ_WRITE
;; From byte 0 in the file
0
;; A size in bytes, we know a date is 4 bytes
4))]
(-> view
;; Writing date
(wr-date 2021
3
16)
;; Ensuring changes are persisted on disk
.force
;; Reading it back from the start of the file
(binf/seek 0)
rr-date)))
Creating a view from a view
It is often useful to create "sub-views" of a view. Akin to wrapping a buffer, a view can wrap a view:
;; An offset of a 100 bytes with a window of 200 bytes
;;
(def sub-view
(binf/view my-view
100
200))
;; The position of that sub-view starts transparently at 0
;;
(= 0
(binf/position sub-view))
;; Contains 200 bytes indeed
;;
(= 200
(binf/limit sub-view))
Working with dynamically-sized data
While reading data in a sequence is easy, writing can sometimes be a bit tricky since one has to decide how much memory to allocate.
Sometimes, the lenght of the data is known in advance and writing is straightforward.
Sometimes, size can be estimated and one can pessimistically allocate more than needed to cover all cases.
Sometimes, size is unknown but easy to compute. A first pass throught the data computes the total number of bytes, a second pass actually writes it without fearing of overflowing and having to check defensively if there is enough space.
And sometimes, size is not trivial to compute or impossible. In one pass, the user must check defensively if there is enough memory for the next bit of data (eg. a date) and then write that bit.
Anyway, when space is lacking, the user can grow a view, meaning copying in one go the content of a view to a new bigger one:
;; Asking for a view which contains 256 additional bytes.
;; Current position is preserved.
;;
(def my-view-2
(binf/grow my-view
256)
Working with 64-bit integers
Working with 64-bit integers is tricky since the JVM does not have unsigned ones
and JS engines do not even really have 64-bit integers at all. The
helins.binf.int64
namespace provide utilities for working with them in a
cross-platform fashion.
It is not the most beautiful experience one will encounter in the course of a lifetime but it works and does the job pretty efficiently.
Extra utilities
Other namespaces provides utilities such as Base64 encoding/decoding, LEB128 encoding/decoding, ...
It is best to navigate through the API.
Interacting with native libraries and WebAssembly
The following namespace is experimental and not yet considered stable.
Complete example in the helins.binf.example.cabi namespace.
Clojure is expanding, reaching new fronts through GraalVM, WebAssembly, new ways of calling native code.
Although the C language does not have a defined ABI, many tools and languages understand a C-like ABI. For instance, the Rust programming language allows for defining structures which follow the same rules as C structures. This is because such rules are often well-defined, straightforward, and there is a need for different languages and tools to understand each other (eg. a shared native library).
The helins.binf.cabi
namespace provides utilities for following those rules,
for instance when defining structures (eg. order of data members, specific aligment
of members depending on size, ...)
Those definitions can be reused for different architectures and ultimately end up being plain old EDN, meaning they can be used in many different ways, especially in combination with the view utilities seen before.
For instance, on the JVM, DirectByteBuffer
which already extends view
protocols is often used in JNI for calling native code. In JS, WebAssembly
memories
are buffers which can be wrapped in views. This provides exciting
possibilities.
Here is an example of defining a C structure for our date. Let us supposed it is meant to be used with WebAssembly which is (as of today) 32-bit:
(require '[helins.binf.cabi :as binf.cabi])
;; This information map defines a 32-bit modern architecture where words
;; are 4 bytes
;;
(def env32
(binf.cabi/env 4))
(= env32
{:binf.cabi/align 4
:binf.cabi.pointer/n-byte 4})
;; Defining a function computing our C date structure
;;
(def fn-struct-date
(binf.cabi/struct :MyDate
[[:year binf.cabi/u16]
[:month binf.cabi/u8]
[:day binf.cabi/u8]]))
;; Computing our C date structure as EDN for a WebAssembly environment
;;
(= (fn-struct-date env32)
{:binf.cabi/align 2
:binf.cabi/n-byte 4
:binf.cabi/type :struct
:binf.cabi.struct/layout [:year
:month
:day]
:binf.cabi.struct/member+ {:day {:binf.cabi/align 1
:binf.cabi/n-byte 1
:binf.cabi/offset 3
:binf.cabi/type :u8}
:month {:binf.cabi/align 1
:binf.cabi/n-byte 1
:binf.cabi/offset 2
:binf.cabi/type :u8}
:year {:binf.cabi/align 2
:binf.cabi/n-byte 2
:binf.cabi/offset 0
:binf.cabi/type :u16}}
:binf.cabi.struct/type :MyDate})
This date structure, in a 32-bit WebAssembly, is 4 bytes, aligns on a multiple
of 2 bytes. It is a :struct
called :MyDate
and all data members are clearly
layed out with their memory offsets computed.
A more challenging example would not be so easy to compute by hand.
Development and testing
This repository is organized with Babashka, a wonderful tool for any Clojurist.
All tasks can be listed by running:
$ bb tasks
For instance, task starting a Clojure dev environment:
$ bb dev:clojure
License
Copyright © 2020 Adam Helinski and Contributors
Licensed under the term of the Mozilla Public License 2.0, see LICENSE.