• Stars
    star
    131
  • Rank 275,867 (Top 6 %)
  • Language
    Clojure
  • License
    Mozilla Public Li...
  • Created over 4 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Handling binary formats in all shapes and forms

BinF stands for "Binary Formats"

Clojars

Cljdoc

CircleCI

Clojure/script library for handling any kind of binary format, protocol ; both in-memory and during IO ; and helping interacting with native libraries and WebAssembly modules.

An authentic Swiss army knife providing:

  • Reading, writing, and copying binary data
  • Via protocols which enhance host classes (js/DataView in JS, ByteBuffer on the JVM, ...)
  • Coercions between primitive types
  • Cross-platform handling of 64-bit integers
  • Excellent support for IO and even memory-mapped files on the JVM
  • Extra utilities such as Base64 encoding/decoding, LEB128, ...
  • Defining C-like composite types (structs, unions, ...) as EDN

Supported platforms:

  • Babashka (besides helins.binf.native namespace)
  • Browser
  • JVM
  • NodeJS

Rationale

Clojure libraries for handling binary data are typically limited and not very well maintained. BinF is the only library providing a seamless experience between Clojure and Clojurescript for pretty much any use case with an extensive set of tools built with low-level performance in mind. While in beta, it has already been used in production and for involving projects such as a WebAssembly decompiler/compiler.

Examples

All examples from the "Usage" section as well as more complete ones are in the ./src/example/helins/binf directory. They are well-described and meant to be tried out at the REPL.

Also, the helins.binf.dev namespace requires all namespaces of this library (quite a few) and can be used for REPLing around.

Cloning this repo is a fast way of trying things out. See the "Development and testing" section.

Usage

This is an overview.

After getting a sense of the library, it is best to try out full examples and explore the full API which describes more namespaces.

Let us require the main namespaces used in this document:

(require '[helins.binf        :as binf]
         '[helins.binf.buffer :as binf.buffer])

Buffers and views

BinF is highly versatile because it leverages what the host offers, following the Clojure mindset. The following main concepts must be understood.

A view is an object encompassing a raw chunk of memory and offering utilities for manipulating it: reading and/or writing binary data. Such a chunk of memory could be a byte array or a file. It does not really matter since views abstract those chunks.

More precisely, a view is anything that implements at least some of the protocols defined in the helins.binf.protocol namespace. Only rarely will the user implement anything since BinF already enhances common classes.

On the JVM, those protocols are implemented for the ubiquitous ByteBuffer which is used pretty much everywhere. In JS, they enhance the just-as-ubiquitous DataView.

By enhancing these host classes, code can be reused for many contexts: handling memory, handling a file, a socket, ...

Finally, by definition, a buffer is an opaque byte array which can be manipulated only via a view. It represents the lowest-level of directly accessible memory a host can provide. On the JVM, a buffer is a plain old byte array. In JS, it is an ArrayBuffer or optionally a SharedArrayBuffer.

Many host utilities expect buffers hence it is important to define a coherent story between buffers and views.

Binary data and operations

Types and related operations follow a predictable naming convention.

The following table summarizes primitive binary types and their names:

Type Description
buffer Byte array
f32 32-bit float
f64 64-bit float
i8 Signed 8-bit integer
i16 Signed 16-bit integer
i32 Signed 32-bit integer
i64 Signed 64-bit integer
string String (UTF-8 by default)
u8 Unsigned 8-bit integer
u16 Unsigned 16-bit integer
u32 Unsigned 32-bit integer
u64 Unsigned 64-bit integer

Reading and writing revolve around these types and happen at a specific position in a view. In absolute operations, that position is provided by the user explicitly. In relative operations, views use an internal position they maintain themselves.

It is much more common to use relative operations since it is more common to read or write things in a sequence. For instance, writing a 32-bit integer will then advance that internal position by 4 bytes.

When writing integers, sign do not matter. For instance, instead of specifying i32 or u32, b32 is used since only the bit pattern matters.

These operations are gathered in the core helins.binf namespace. Some examples showing the naming convention are:

Operation Description
wa-b32 Write a 32-bit integer at an absolute position
rr-i64 Read a signed 64-bit integer from the current relative position
wr-buffer Copy the given buffer to the current relative position of the view
ra-string Read a string from an absolute position

The first letter denotes reading or writing, the second letter denotes absolute or relative.

It is best to follow that naming convention when writing custom functions.

For instance, writing and reading a YYYY/mm/dd date "relatively":

(defn wr-date
  [view year month day]
  (-> view
      (binf/wr-b16 year)
      (binf/wr-b8 month)
      (binf/wr-b8 day)))


(defn rr-date
  [view]
  [(binf/rr-u16 view)
   (binf/rr-u8 view)
   (binf/rr-u8 view)])

Creating a view from a buffer

Complete example in the helins.binf.example namespace.

;; Allocating a buffer of 1024 bytes
;;
(def my-buffer
     (binf.buffer/alloc 1024))

;; Wrapping the buffer in view
;;
(def my-view
     (binf/view my-buffer))

;; The buffer can always be extracted from its view
;;
(binf/backing-buffer my-view)

Using our date functions defined in the previous section:

;; From the current position (0 for a new view)
;;
(let [position-date (binf/position my-view)]
  (-> my-view
      (wr-date 2021
               3
               16)
      (binf/seek position-date)
      rr-date))

;; => [2021 3 16]

Creating a view over a memory-mapped file (JVM)

Complete example in the helins.binf.example.mmap-file namespace.

On the JVM, BinF protocols already extends the popular ByteBuffer used extensively by many utilities, amongst them IO ones (about anything in java.nio).

One notable mention is the child class MappedByteBuffer, a special type of ByteBuffer which memory-maps a file. This technique usually results in fast and efficient IO for larger file while being easy to follow.

Our date functions used in the previous section be applied to such a memory-mapped file without any change.

There are a few ways for obtaining a MappedByteBuffer, here is one example:

(import 'java.io.RandomAccessFile
        'java.nio.channels.FileChannel$MapMode)

(with-open [file (RandomAccessFile. "/tmp/binf-example.dat"
                                    "rw")]
  (let [view (-> file
                 .getChannel
                 (.map FileChannel$MapMode/READ_WRITE
                       ;; From byte 0 in the file
                       0
                       ;; A size in bytes, we know a date is 4 bytes
                       4))]
    (-> view
        ;; Writing date
        (wr-date 2021
                 3
                 16)
        ;; Ensuring changes are persisted on disk
        .force
        ;; Reading it back from the start of the file
        (binf/seek 0)
        rr-date)))

Creating a view from a view

It is often useful to create "sub-views" of a view. Akin to wrapping a buffer, a view can wrap a view:

;; An offset of a 100 bytes with a window of 200 bytes
;;
(def sub-view
     (binf/view my-view
                100
                200))

;; The position of that sub-view starts transparently at 0
;;
(= 0
   (binf/position sub-view))

;; Contains 200 bytes indeed
;;
(= 200
   (binf/limit sub-view))

Working with dynamically-sized data

While reading data in a sequence is easy, writing can sometimes be a bit tricky since one has to decide how much memory to allocate.

Sometimes, the lenght of the data is known in advance and writing is straightforward.

Sometimes, size can be estimated and one can pessimistically allocate more than needed to cover all cases.

Sometimes, size is unknown but easy to compute. A first pass throught the data computes the total number of bytes, a second pass actually writes it without fearing of overflowing and having to check defensively if there is enough space.

And sometimes, size is not trivial to compute or impossible. In one pass, the user must check defensively if there is enough memory for the next bit of data (eg. a date) and then write that bit.

Anyway, when space is lacking, the user can grow a view, meaning copying in one go the content of a view to a new bigger one:

;; Asking for a view which contains 256 additional bytes.
;; Current position is preserved.
;;
(def my-view-2
     (binf/grow my-view
                256)

Working with 64-bit integers

Working with 64-bit integers is tricky since the JVM does not have unsigned ones and JS engines do not even really have 64-bit integers at all. The helins.binf.int64 namespace provide utilities for working with them in a cross-platform fashion.

It is not the most beautiful experience one will encounter in the course of a lifetime but it works and does the job pretty efficiently.

Extra utilities

Other namespaces provides utilities such as Base64 encoding/decoding, LEB128 encoding/decoding, ...

It is best to navigate through the API.

Interacting with native libraries and WebAssembly

The following namespace is experimental and not yet considered stable.

Complete example in the helins.binf.example.cabi namespace.

Clojure is expanding, reaching new fronts through GraalVM, WebAssembly, new ways of calling native code.

Although the C language does not have a defined ABI, many tools and languages understand a C-like ABI. For instance, the Rust programming language allows for defining structures which follow the same rules as C structures. This is because such rules are often well-defined, straightforward, and there is a need for different languages and tools to understand each other (eg. a shared native library).

The helins.binf.cabi namespace provides utilities for following those rules, for instance when defining structures (eg. order of data members, specific aligment of members depending on size, ...)

Those definitions can be reused for different architectures and ultimately end up being plain old EDN, meaning they can be used in many different ways, especially in combination with the view utilities seen before.

For instance, on the JVM, DirectByteBuffer which already extends view protocols is often used in JNI for calling native code. In JS, WebAssembly memories are buffers which can be wrapped in views. This provides exciting possibilities.

Here is an example of defining a C structure for our date. Let us supposed it is meant to be used with WebAssembly which is (as of today) 32-bit:

(require '[helins.binf.cabi :as binf.cabi])


;; This information map defines a 32-bit modern architecture where words
;; are 4 bytes
;;
(def env32
     (binf.cabi/env 4))

(=  env32

    {:binf.cabi/align          4
     :binf.cabi.pointer/n-byte 4})


;; Defining a function computing our C date structure
;;
(def fn-struct-date
     (binf.cabi/struct :MyDate
                       [[:year  binf.cabi/u16]
                        [:month binf.cabi/u8]
                        [:day   binf.cabi/u8]]))


;; Computing our C date structure as EDN for a WebAssembly environment
;;
(= (fn-struct-date env32)

   {:binf.cabi/align          2
    :binf.cabi/n-byte         4
    :binf.cabi/type           :struct
    :binf.cabi.struct/layout  [:year
                               :month
                               :day]
    :binf.cabi.struct/member+ {:day   {:binf.cabi/align  1
                                       :binf.cabi/n-byte 1
                                       :binf.cabi/offset 3
                                       :binf.cabi/type   :u8}
                               :month {:binf.cabi/align  1
                                       :binf.cabi/n-byte 1
                                       :binf.cabi/offset 2
                                       :binf.cabi/type  :u8}
                               :year  {:binf.cabi/align  2
                                       :binf.cabi/n-byte 2
                                       :binf.cabi/offset 0 
                                       :binf.cabi/type   :u16}}
    :binf.cabi.struct/type    :MyDate})

This date structure, in a 32-bit WebAssembly, is 4 bytes, aligns on a multiple of 2 bytes. It is a :struct called :MyDate and all data members are clearly layed out with their memory offsets computed.

A more challenging example would not be so easy to compute by hand.

Development and testing

This repository is organized with Babashka, a wonderful tool for any Clojurist.

All tasks can be listed by running:

$ bb tasks

For instance, task starting a Clojure dev environment:

$ bb dev:clojure

License

Copyright © 2020 Adam Helinski and Contributors

Licensed under the term of the Mozilla Public License 2.0, see LICENSE.

More Repositories

1

wasm.cljc

Spec compliant WebAssembly compiler, decompiler, and generator
Clojure
244
star
2

dsim.cljc

Idiomatic and purely functional discrete event-simulation
Clojure
119
star
3

kafka.clj

Clojure client for Kafka
Clojure
107
star
4

clojure-of-things

Documentation about how to run Clojure on the Raspberry Pi
61
star
5

interval.cljc

Immutable interval trees and utilities
Clojure
60
star
6

fdat.cljc

Function serialization between Clojure processes and dialects
Clojure
55
star
7

linux.gpio.clj

Use the standard Linux GPIO API from Clojure JVM
Clojure
35
star
8

maestro.clj

Zen way for managing a Clojure/script monorepo
Clojure
32
star
9

timer.cljs

Scheduling async operations in Clojurescript
Clojure
28
star
10

void.cljc

About void and absence of information
Clojure
19
star
11

linux-gpio.java

Use the standard Linux GPIO api from Java
Java
17
star
12

canvas.cljs

Accessing the Canvas API
Clojure
16
star
13

rktree.cljc

Trees where leaves are located both in time and space
Clojure
15
star
14

medium.cljc

Utilities for targeting different compilation environments in Clojure/script
Clojure
14
star
15

linux.i2c.clj

Use the standard Linux I2C API from Clojure JVM
Clojure
12
star
16

linux-i2c.java

Use the standard Linux I2C API from the JVM
Java
11
star
17

templ-lib.cljc

Template for CLJC libraries
Clojure
11
star
18

mprop.cljc

Multiplexing `test.check` properties for thorough generative testing
Clojure
7
star
19

linux-epoll.java

Use Linux's epoll from java
Java
6
star
20

mqtt.clj

Async MQTT 3.x clojure client
Clojure
6
star
21

rxtx.clj

Serial IO based on RXTX from Clojure JVM
Clojure
3
star
22

coload.cljc

Loading Clojure in sync with Clojurescript during dev
Clojure
3
star
23

byte_buffer.cpp

Easily and safely read/write any type to byte arrays
C++
2
star
24

mbus.clj

Using the Meter-Bus protocol from Clojure JVM
Clojure
2
star
25

linux.spi.clj

Clojure library for talking to SPI devices from Linux
Clojure
2
star
26

sysrun.clj

Miscellaneous system utilities for Clojure JVM
Clojure
2
star
27

fcss.cljc

Minifying Garden classes showing up in advanced CLJS builds
Clojure
2
star
28

linux.i2c.mcp342x.clj

Talking to the MCP342x family of ADC via I2C from Clojure JVM
Clojure
2
star
29

linux-common.java

Miscellaneous JNA utilities related to Linux
Java
2
star
30

htm.clj

Clojure implementation of Hierarchical Temporal Memory
Clojure
1
star
31

dvlopt-cljs

Dvlopt lein template for clojurescript
Clojure
1
star
32

linux.i2c.horter-i2hae.clj

A/D conversion via I2C with Horter I2HAE from Clojure JVM
Clojure
1
star
33

linux.i2c.bme280.clj

Talking to BME280 sensors via I2C from Clojure JVM
Clojure
1
star
34

linux-io.java

Basic Linux IO utilities for java through JNA
Java
1
star
35

utimbre.clj

Miscellaneous utilities for Timbre
Clojure
1
star
36

pi4clj

Clojure library for IO on the Raspberry Pi
Clojure
1
star
37

ex.clj

Java exceptions as clojure data
Clojure
1
star
38

fn.cpp

Pass around c++ fns and methods, get performance
C++
1
star
39

utimbre.appenders.kafka.clj

Timbre appender for Apache Kafka
Clojure
1
star
40

fulcro.initLocalState

Repro case demonstrating how :initLocalState seem to misbehave in Fulcro 3
Clojure
1
star