• Stars
    star
    152
  • Rank 237,363 (Top 5 %)
  • Language
    JavaScript
  • License
    ISC License
  • Created almost 2 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An efficient drop-in replacement for JSON.

JCOF: JSON-like Compact Object Format

A more efficient way to represent JSON-style objects.

Status

This format isn't nailed down yet. Most changes will likely be additive, such that existing JCOF documents will remain valid, but nothing is guaranteed. Use at your own risk. In its current form, JCOF is suitable for closed systems where one party controls every producer and consumer and where every implementation can be updated at once.

About

JCOF tries to be a drop-in replacement for JSON, with most of the same semantics, but with a much more compact representation of objects. The main way it does this is to introduce a string table at the beginning of the object, and then replace all strings with indexes into that string table. It also employs a few extra tricks to make objects as small as possible, without losing the most important benefits of JSON. Most importantly, it remains a text-based, schemaless format.

The following JSON object:

{
	"people": [
		{"first-name": "Bob", "age": 32, "occupation": "Plumber", "full-time": true},
		{"first-name": "Alice", "age": 28, "occupation": "Programmer", "full-time": true},
		{"first-name": "Bernard", "age": 36, "occupation": null, "full-time": null},
		{"first-name": "El", "age": 57, "occupation": "Programmer", "full-time": false}
	]
}

could be represented as the following JCOF object:

Programmer;"age""first-name""full-time""occupation";
{"people"[(0,iw"Bob"b"Plumber")(0,is"Alice"b,s0)(0,iA"Bernard"n,n)(0,iV"El"B,s0)]}

Minimized, the JSON is 299 bytes, with 71.5 bytes on average per person object. The JCOF is 134 bytes, with only 17.5 bytes per person object; that's 0.45x the size in total, and 0.23x the size per person object. The reason the JCOF is so much smaller is threefold:

  1. It has a string table, so that strings which occur multiple times only have to be included in the JCOF document once. In this example object, the only duplicated string is "Programmer".
  2. It has an object shapes table, so that object shapes which occur multiple times only have to have their keys encoded once. In this example object, the only duplicated object shape is {"age", "first-name", "full-time", "occupation"}.
  3. It has more compact encodings for various values and syntax. Large integers can be encoded as base 62 rather than base 10, booleans and null are encoded using single characters, and separator characters can be skipped where that results in an unambiguous document.

Rationale

I was making a JSON-based serialization format for a game I was working on, but found myself making trade-offs between space efficiency and descriptive key names, so decided to make a format which makes that a non-issue. I then kept iterating on it until I had what I call JCOF today.

In most cases, you would use plain JSON, or if size is a concern, you would use gzipped JSON. But there are times when size is a concern and you can't reasonably use gzip; for example, gzipping stuff from JavaScript in the browser is inconvenient until TextEncoderStream is supported in Firefox, and having a smaller uncompressed encoding can be an advantage some cases even where gzip is used. I've also observed significant reductions in size between compressed JSON and compressed JCOF in certain cases.

I'm publishing it because other people may find it useful too. If you don't find it useful, feel free to disregard it.

Reference implementations

The only reference implementation currently is the javascript one, in implementations/javascript/jcof.js. It's published on NPM here: https://www.npmjs.com/package/jcof

Benchmarks

This is the sizes of various documents in JSON compared to JCOF (from the test suite):

tiny.json:
  JSON: 299 bytes
  JCOF: 134 bytes (0.448x)
circuitsim.json:
  JSON: 8315 bytes
  JCOF: 2093 bytes (0.252x)
pokemon.json:
  JSON: 219635 bytes
  JCOF: 39650 bytes (0.181x)
pokedex.json:
  JSON: 56812 bytes
  JCOF: 23132 bytes (0.407x)
madrid.json:
  JSON: 37960 bytes
  JCOF: 11923 bytes (0.314x)
meteorites.json:
  JSON: 244920 bytes
  JCOF: 87028 bytes (0.355x)
comets.json:
  JSON: 51949 bytes
  JCOF: 37480 bytes (0.721x)

The format

Here's the grammar which describes JCOF:

grammar ::= string-table ';' object-shape-table ';' value

string-table ::= (string (','? string)*)?
string ::= plain-string | json-string
plain-string ::= [a-zA-Z0-9]+
json-string ::= [https://datatracker.ietf.org/doc/html/rfc8259#section-7]

object-shape-table ::= (object-shape (',' object-shape)*)?
object-shape ::= object-key (':'? object-key)*
object-key ::= base62 | json-string
base62 ::= [0-9a-zA-Z]+

value ::=
  array-value |
  object-value |
  number-value |
  string-value |
  bool-value |
  null-value

array-value ::= '[' (value (','? value)*)? ']'
object-value ::= shaped-object-value | keyed-object-value
shaped-object-value ::= '(' base62 (','? value)* ')'
keyed-object-value ::= '{' (key-value-pair (','? key-value-pair)*)? '}'
key-value-pair ::= object-key ':'? value
number-value ::= 'i' base62 | 'I' base62 | 'finf' | 'fInf' | 'fnan' | float-value
float-value ::= '-'? [0-9]+ ('.' [0-9]+)? (('e' | 'E') ('-' | '+')? [0-9]+)?
string-value ::= 's' base62 | json-string
bool-value ::= 'b' | 'B'
null-value ::= 'n'

See the bottom of the readme for a railroad diagram.

In addition to the grammar, you should know the following:

Many separators are optional

The grammar contains optional separators (','?, ':'?). These separators can be skipped if either the character before or the character after is any of the following: [, ], {, }, (, ), ,, : or ". This saves a bunch of bytes. JCOF generators can choose to always emit separators, but parsers must accept JCOF documents with missing separators.

The string table

All JCOF objects start with a string table, which is a list of strings separated by an optional ,.

The object shapes table

An "object shape" is defined as a list of keys. If you have a bunch of objects with the same keys, it's usually advantageous to define that set of keys once in the object shapes table and encode the objects with the shaped objects syntax. An object shape is a list of object keys optionally separated by :, and the object shape table is a list of object shapes (non-optionally) separated by ,

Base62

Base62 encoding just refers to writing integer numbers in base 62 rather than base 10. This lets us use 0-9, a-z and A-Z as digits. The characters from 0 to 9 represent 0-9, the characters a to z represent 10-35, and the characters A to Z represent 36-61.

Values

A value can be:

  • An array literal: [, followed by 0 or more values, followed by ]
  • A shaped object literal: (, followed by an object shape index, followed by values, followed by )
    • The object shape index is a base62-encoded index into the object shapes table
  • An object literal: {, followed by 0 or more key-value pairs, followed by }
    • A key-value pair is a base62 index into the header, followed by a :, followed by a value
  • A string reference: s followed by a base62 index into the header
  • A JSON string literal
  • A number literal:
    • i followed by a base62 number: A positive integer
    • I followed by a base62 number: A negative integer
    • A floating point number written in decimal, with an optional fractional part and an optional exponent part
  • A bool literal: b: true B: false
  • A null literal: n

Railroad diagram

generated with bnf-railroad-generator

railroad diagram

More Repositories

1

snow

A testing library for C.
C
348
star
2

dedaemon

Desktop Environment-like functionality in a daemon.
JavaScript
64
star
3

housecat

A static site generator, written in C.
C
60
star
4

easy-makefile

An easily configurable Makefile.
Makefile
44
star
5

strliteral

Embed files into C/C++ projects.
C
41
star
6

mauncher

Launcher for Wayland.
C
38
star
7

mortup-js

Markdown-inspired markup language.
JavaScript
33
star
8

mouseless-plugin

For a mouseless future.
JavaScript
32
star
9

CPU-16

Logisim CPU.
JavaScript
29
star
10

greylang

The Grey programming language
C
18
star
11

rv32i-logisim-cpu

Implementation of RV32I in Logisim-evolution.
Makefile
18
star
12

tlsproxy

A web/proxy server with automatic https.
JavaScript
16
star
13

langbot

Run code from many programming languages in Discord!
Rust
15
star
14

squirrelWords-plugin

JavaScript
13
star
15

beanbar

Status bar using web technologies.
C
12
star
16

hconfig

HConfig, better javascript config files
JavaScript
11
star
17

mmenu

A dmenu wrapper which works like dmenu_run, but evaluates math you give it too.
Shell
10
star
18

gilia

The Gilia programming language
C
10
star
19

smake

Stupid simple Makefile generator tool.
Shell
9
star
20

osyris

A lisp
Rust
8
star
21

makeschem

Generate a Minecraft .schematic file based on an easy-to-write text file.
C
7
star
22

mmpc-media-streamer

For streaming media on a media PC.
JavaScript
7
star
23

mmpc

Mort's Media PC.
JavaScript
7
star
24

bnf-railroad-generator

Generate railroad diagrams from a BNF-like language.
JavaScript
6
star
25

sercom

Nice and simple serial console.
Python
6
star
26

lafun-language

LaFuN programming language
C++
6
star
27

dots

Dotfiles 2.0
Vim Script
5
star
28

JC3MM

Mod Manager for Just Cause 3.
C#
5
star
29

shoelips

Stack based postfix programming language.
JavaScript
5
star
30

xxsh

Extra small, self-contained shell.
C
4
star
31

lograt

Log explorer
C++
4
star
32

scar

Seekable compressed tar
Rust
4
star
33

msoak

Soak up compiler output and display it nicely.
C
4
star
34

json5cpp

A JSON5 parser for C++ built on JsonCpp.
C++
4
star
35

jsSiteBuilder

Thing to build and maintain static HTML files. Written in node.js and PHP.
PHP
3
star
36

nouwell

Content management system.
JavaScript
3
star
37

binpretty

A tool to view mixed binary/textual data.
C
2
star
38

axa-cpu

Supporting software for the Axa CPU architecture.
Rust
2
star
39

ladbot

A modular IRC bot framework written in javascript for node.js.
JavaScript
2
star
40

phonecam

Use phone as web cam.
C++
2
star
41

dev-refresh

A utility for watching for changes in directories while doing web development.
JavaScript
2
star
42

steambly

Mod.
Java
1
star
43

mmpc2

JavaScript
1
star
44

advent-of-code-2019

Advent of code 2019
Python
1
star
45

timeBall

JavaScript
1
star
46

gamejam-client

JavaScript
1
star
47

blogsoftware

Making blog software!
PHP
1
star
48

pbin.in

Pastebin clone.
PHP
1
star
49

gamejam-server

JavaScript
1
star
50

pree

Process tree implementation. Shows memory usage, uses fancy unicode, written in Go.
Go
1
star
51

mort-st

Fork of suckless' st with my own config and patches applied.
C
1
star
52

mmpc-remote-desktop

For remote desktop for a media PC.
JavaScript
1
star
53

thelounge-theme-mortified

A dark grey theme for The Lounge.
CSS
1
star
54

threadviz

ThreadViz: Visualize your program's threads
HTML
1
star
55

project-swan

Game.
C++
1
star
56

starstruck

Prompt printer
Rust
1
star
57

hims-frontend

Frontend for OpenMRS-2.3
JavaScript
1
star
58

socksugar

Websockets with sugar on top.
JavaScript
1
star
59

tequilaJumper

game I made for Ludum Dare 26 Jam.
1
star
60

circuitgame

A game with digital circuits.
JavaScript
1
star
61

llvm

Fork of clang.
C++
1
star
62

waitland

Wait for a wayland compositor connection to die.
C
1
star
63

llvm-axa-backend

LLVM with a back-end for the Axa ISA: https://docs.google.com/spreadsheets/d/1LIplJOF0Cd7MD3LWmlF01pfKx2tIuGzZWag7-Bvwp6I
1
star