• Stars
    star
    567
  • Rank 78,634 (Top 2 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 12 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lightweight, extremely high-performance JSON parser for C++11

Documentation

sajson

sajson is an extremely high-performance, in-place, DOM-style JSON parser written in C++.

Originally, sajson meant Single Allocation JSON, but it now supports dynamic allocation too.

Features

sajson parses an input document into a contiguous AST structure. Unlike some other high-performance JSON parsers, the AST is efficiently queryable. Object lookups by key are O(lg N) and array indexing is O(1).

sajson does not require that the input buffer is null-terminated. You can use it to parse straight out of a disk mmap or network buffer, for example.

sajson is in-situ: it modifies the input string. While parsing, string values are converted to UTF-8.

(Note: sajson pays a slight performance penalty for not requiring null termination of the input string. Because sajson is in-situ, many uses cases require copying the input data anyway. Therefore, I could be convinced to add an option for requiring null termination.)

Other Features

  • Single header file -- simply drop sajson.h into your project.
  • No exceptions, RTTI, or longjmp.
  • O(1) stack usage. No document will overflow the stack.
  • Only two number types: 32-bits and doubles.
  • Small code size -- suitable for Emscripten.
  • Has been fuzzed with American Fuzzy Lop.

AST Structure

The parsed AST's size is computed as such:

  • 2 words per string
  • 1 word per 32-bit integer value
  • 64 bits per floating point value
  • 1+N words per array, where N is the number of elements
  • 1+3N words per object, where N is the number of members

The values null, true, and false are encoded in tag bits and have no cost otherwise.

Allocation Modes

Single

The original sajson allocation mode allocates one word per byte of the input document. This is the fastest mode: because the AST and parse stack are guaranteed to fit, no allocation checks are required at runtime.

That is, on 32-bit platforms, sajson allocates 4 bytes per input character. On 64-bit platforms, sajson allocates 8 bytes per input character. Only use this parse mode if you can handle allocating the worst-case buffer size for your input documents.

Dynamic

The dynamic allocation mode grows the parse stack and AST buffer as needed. It's about 10-40% slower than single allocation because it needs to check for out-of-memory every time data is appended, and occasionally the buffers need to be reallocated and copied.

Bounded

The bounded allocation mode takes a fixed-size memory buffer and uses it for both the parse stack and the resulting AST. If the parse stack and AST fit in the given buffer, the parse succeeds. This allocation mode allows using sajson without the library making any allocations.

Performance

sajson's performance is excellent - it frequently benchmarks faster than RapidJSON, for example.

Implementation details are available at http://chadaustin.me/tag/sajson/.

Documentation

API documentation is available at http://chadaustin.github.io/sajson/doxygen/

Downsides / Missing Features

  • sajson does not support UTF-16 or UTF-32. However, I have never seen one of those in the wild, so I suspect they may be a case of aggressive overspecification. Some JSON specifications indicate that UTF-8 is the only valid encoding. Either way, just transcode to UTF-8 first.

  • No support for 64-bit integers. If this is something you want, just ask. (There's little technical reason not to support it. It's just that most people avoid 64-bit integers in JSON because JavaScript can't read them.)

  • Requires C++11. Some of the ownership semantics were awkward to express in C++03.

More Repositories

1

is-it-snappy

iOS App for measuring input-to-output latency
Swift
106
star
2

ibb

I/O-Bound Build
Python
31
star
3

wired-sculpt-pcb

KiCad design files for the Wired Sculpt conversion mod
27
star
4

buffer-builder

Haskell library for efficiently building up buffers
Haskell
25
star
5

Web-Benchmarks

HTML
19
star
6

jnd

JSON Never Dies - a binary JSON encoding experiment
Python
11
star
7

2016frontend

Template for a Typical Frontend Stack as of 2016
TypeScript
10
star
8

CPUInfo

queries x86 processors for their capabilities, including cpu speed, cache sizes, and instruction sets
C++
7
star
9

gmail-mbox-analyzer

Convert an mbox file to SQLite and find which emails are taking space
Rust
7
star
10

qbjs

Steve Hanov's QBasic on JavaScript
6
star
11

renaissance

Functional GPU shader language
C
6
star
12

photohash

Tool for detecting duplicate photos and diffing directories
Rust
5
star
13

qmk_firmware_sculpt

Fork of qmk/qmk_firmware to add support for Microsoft Wired Sculpt wired conversion mod
C
4
star
14

iPodExtract

Extracts and reindexes MP3s from fifth generation iPods
Python
3
star
15

audiere

Audiere - Imported from SourceForge
HTML
3
star
16

FlushMem

Allocates as much memory as possible, forcing Windows to flush the disk cache
C++
3
star
17

zopfli-proxy

HTTP proxy that recompresses with zopfli
Haskell
3
star
18

corona

Corona Image I/O Library
C
3
star
19

glee

This is an old fork of some code I don't maintain. Use the upstream library instead.
C
3
star
20

nehe-emscripten

Testing Emscripten/Regal with NeHe Lessons
C++
2
star
21

sphere

Sphere RPG Engine - Imported from SourceForge
C++
2
star
22

AstroMenaceEmscripten

C++
2
star
23

isugamedev

Iowa State Game Developers Club - Imported from SourceForge
C++
1
star
24

mciconsole

Command-line interface to the Windows Media Control Interface (MCI) API
C++
1
star
25

batch-channel

Async channels for Rust that support reading and writing many values
Rust
1
star
26

wakerset

Zero-allocation Waker set for implementing async data structures
Rust
1
star
27

misc

Little programs I've written
C++
1
star