• Stars
    star
    363
  • Rank 117,318 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created about 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Exposing problems in json parsers of several programming languages.

Nesting levels for JSON parsers

Build Status

Documenting how JSON parsers of several programming languages deal with deeply nested structures.

Introduction

Many JSON parsers (and many parsers in general) use recursion to parse nested structures. This is very convenient while programming the parser, but it has consequences on what the parser can parse: indeed, the size of the call stack is usually limited to a value several orders of magnitude smaller than the available RAM, and this implies that a program with too many levels of recursion will fail.

The two most recent JSON standards RFC 8259 and RFC 7159 both say "An implementation may set limits on the maximum depth of nesting". However, the ECMA-404 specification doesn't contain any limit on how deeply nested JSON structures can be.

This means that there is not a defined level of nesting which is correct or incorrect with regard to the JSON specification, and JSON parsers may differ when parsing nested structures.

Some recursive parser libraries implement a safety check in order to avoid crashing the calling program: they artificially limit the maximum depth they accept (often making that limit configurable), hoping that the size of the stack at the moment they are called plus the artificial limit will always be smaller than the total stack size. This limit is an arbitrary choice of the library implementer, and it explains all the lower values of the comparison you'll see below.

Some parsers do not use the operating system stack at all to parse nested structures (they usually implement a state machine instead). These can usually accept arbitrarily deeply nested structures. Of course, for non-streaming parsers, they cannot physically be provided infinitely large inputs, and thus cannot produce infinitely-large outputs.

You should note that parsers that set an arbitrary limit on the input nesting level are not safer and do not provide any more memory consumption guarantees than parsers that can handle arbitrarily nested input: they still consume an amount of resources proportional to the size of their input.

This repository contains tools to measure the nesting limits of JSON parsers of different languages.

How to use

This repository contains a script called test_parser.py that takes a JSON parser and uses binary search to find the smallest JSON structure it fails to parse and print its nesting level.

The json parser must be a program that reads JSON on its standard input, exits with a status of 0 if it managed to parse it and any other status if an error occurred.

How it works

test_parser.py constructs json structures composed uniquely of nested arrays, and gives them to the program it tests. For instance, for a depth of 3, it builds the following json : [[[]]]. This allows to create a structure of only 2n bytes that has n nesting levels. It uses binary search to find the smallest structure for which the program fails.

Results

The various implementations in this repository are continuously tested by Travis CI on a virtual machine running Ubuntu 18.04, with 8Gb of RAM, and a maximum stack size of 8.192 Mb.

Here are the results we found, sorted from least nesting allowed by default to the most:

language json library nesting level file size notes
C# System.Text.Json 65 130 bytes configurable (JsonSerializerOptions.MaxDepth) *
ruby json 101 202 bytes configurable (:max_nesting) *
rust serde_json 128 256 bytes disableable (disable_recursion_limit) *
shell jq 257 514 bytes undocumented
php json_decode 512 1.0 KB configurable ($depth) *
perl JSON::PP 513 1.0 KB configurable (max_depth) *
swift JSONDecoder 514 1.0 KB undocumented
python3 json 995 2.0 KB configurable (sys.setrecursionlimit) *, undocumented
C jansson 2049 4.0 KB
javascript JSON.parse 5712 11.4 KB Node.js 8 LTS
java Gson 6100 12 KB
java Jackson 6577 13 KB
go json-iterator 10002 20 KB configurable (Config.MaxDepth) *
PostgreSQL json type 11887 23 KB configurable (max_stack_depth), undocumented
D std.json 37370 74.7 KB segfaults
C++ RapidJSON 87266 175 KB segfaults
Nim json 104769 209 KB segfaults
OCaml yojson 130380 260 KB
go encoding/json 1973784 3.9 MiB fatal error, goroutine stack exceeds 1000000000-byte limit
C++ JSON for Modern C++ ∞ ∞ segfault fixed in v3.7.2
C# Newtonsoft.Json ∞ ∞
ruby Oj ∞ ∞
Haskell Aeson ∞ ∞

* Note that configurable and disableable mean only that the default depth check inside the parser itself can be configured or disabled, not that the parser can be made to accept any nesting depth. When disabling the limit or increasing it too much, the parser will crash the calling program instead of returning a clean error.

Remarks

I tried to test the most popular json library of each language. If you want to add a new language or a new library, feel free to open a pull request. All the parameters were left to their default values.

More Repositories

1

whitebophir

Online collaborative Whiteboard that is simple, free, easy to use and to deploy
JavaScript
1,720
star
2

react-contenteditable

React component for a div with editable contents
TypeScript
1,612
star
3

dezoomify

Dezoomify is a web application to download zoomable images from museum websites, image galleries, and map viewers. Many different zoomable image technologies are supported.
JavaScript
581
star
4

marshmallow_dataclass

Automatic generation of marshmallow schemas from dataclasses.
Python
452
star
5

dezoomify-rs

Zoomable image downloader for Google Arts & Culture, Zoomify, IIIF, and others
Rust
437
star
6

sanipasse

Vérificateur de passe sanitaire open-source
Svelte
176
star
7

json_in_type

Fast json encoder in rust, that encodes the structure of JSON values in their types
Rust
82
star
8

custom_error

Define custom errors without boilerplate using the custom_error! macro.
Rust
70
star
9

ophirofox

Une extension pour navigateur qui permet de lire les articles de presse en ligne sur le compte de bibliothèques ayant souscrit à europresse
JavaScript
59
star
10

pagelabels-py

Python library to manipulate PDF page labels
Python
55
star
11

highs-js

Javascript linear programming library
JavaScript
45
star
12

salesman.js

Solves the traveling salesman problem using simulated annealing.
JavaScript
43
star
13

linear-solve

Small javascript library to solve a system of linear equations, invert a matrix, and nothing more.
JavaScript
38
star
14

bloomfilter

Simplistic (but fast) java implementation of a bloom filter.
Java
37
star
15

mandelbrot

A mandelbrot fractal viewer in javascript using svelte
JavaScript
34
star
16

bin2png

Embed binary data inside an HTML file in an efficient way.
JavaScript
34
star
17

dezoomify-extension

A browser extension to detect zoomable images in web pages and downloading them with dezoomify
JavaScript
33
star
18

TPCH-sqlite

SQLite TPCH database
Shell
32
star
19

Sensitive-Topic-History-Quiz

This is the only place where me, the human is talking. All of the files in this repo were generated by ChatGPT. They required hours of interactions with the language model to make it fix its own bugs, and create coherent components, but I am very proud of the result.
JavaScript
31
star
20

fast_array_intersect

The fastest javascript array intersection function
JavaScript
18
star
21

eml2csv

Convert a collection of eml files to CSV
Python
17
star
22

SQLpage

SQL-only webapp builder, empowering data analysts to build websites and applications quickly
Rust
17
star
23

dia2code

Dia2Code is a small utility used to generate code from a Dia diagram.
C
14
star
24

historique-velib-opendata

Historique des données d'occupation de stations vélib' (publiées en opendata)
Python
13
star
25

musreact

Mustache template to react component compiler
JavaScript
12
star
26

graham-fast

Graham scan implementation in javascript
JavaScript
10
star
27

html2unicode

Node module for transforming HTML into unicode
JavaScript
10
star
28

wordsearch

Search words by regex
Svelte
9
star
29

seamcarving

Seam carving implemented in rust
Rust
9
star
30

ZIF

zif file format documantation and tools
JavaScript
9
star
31

samsung-email-password-decrypt

Decrypt encrypted passwords in EmailProvider.db on samsung phones.
Java
9
star
32

dezoom.sh

Download and assemble tiled images. Dezoomify for bash. Depends on imagemagick
Shell
8
star
33

github-sloc

Firefox extension that prints the number of lines of code of a project on project pages on github.
JavaScript
8
star
34

wikipedia-externallinks-fast-extraction

Fast extraction of all external links from wikipedia
Rust
8
star
35

lagrange-cpp

Lagrange interpolation polynomials in C++11
C++
7
star
36

gnome-keyboard-backlight-menu

Set the keyboard backlight brightness with a slider in gnome shell's system menu.
JavaScript
7
star
37

docurun

JavaScript
6
star
38

rectangle-overlap

Fastly compute the intersection of two rectangles.
TypeScript
6
star
39

pyformat-challenge

Python format string vulnerability exploitation challenge
Python
6
star
40

memoization

Straightforward implementation of memoization in javascript
JavaScript
6
star
41

reg

rÊg is a simple grid game
JavaScript
5
star
42

dezoomify-py

Fork of https://sourceforge.net/projects/dezoomify/
Python
5
star
43

haskell-exercises

Exercices pour apprendre le haskell
Haskell
4
star
44

GoodChat

Simple chat application with ES6 + Vue.js + CouchDB
JavaScript
4
star
45

kdsearch

Search k-dimensional datasets efficiently using KDTrees
Python
4
star
46

secured-file-transfer

Secured file transfer implemented in nodeJS.
JavaScript
4
star
47

expectation-maximization

Multivariate gaussian fit with expectation–maximization (EM) algorithm in javascript.
JavaScript
4
star
48

maya_numerals_converter

Online decimal to maya numeral converter.
HTML
4
star
49

BarcodeDetector-api-demo

A quick demo for the Barcode Detection API
HTML
4
star
50

pff-extract

pff (zoomify single-file image format) to jpeg converter
C
3
star
51

choices

Represent a choice between multiple values, using radio buttons, checkboxes, or HTML's <select> element
Elm
3
star
52

elm-rolling-list

A circular buffer implementation in Elm.
Elm
3
star
53

robots

Multi-agents system where robots go fetch materials.
HTML
2
star
54

RLE

Run-length encoding and decoding in haskell.
Haskell
2
star
55

multivariate-gaussian

Multivariate normal distribution density function implemented in javascript
JavaScript
2
star
56

emotions

Detect emotions from webcam images.
CSS
2
star
57

download-book-of-kells

Shell
2
star
58

parse_wiki_text

Parse wiki text from Mediawiki into a tree of elements
Rust
2
star
59

SuperLogger

Log system information using logstash, store the information on ElasticSearch, and visualize it using Kibana.
Java
2
star
60

elm-jsonpseudolist

Elm Json.Decoder for javascript Array-like objects
Elm
2
star
61

SearchHitIterator

Java iterator for elasticsearch scrolls
Java
2
star
62

setup-emscripten

emscripten github action
JavaScript
2
star
63

sha_hashes

Collection of sha hashes of common passwords
2
star
64

sql2json

Convert sql database dumps to JSON
Rust
2
star
65

2048.lua

lua implementation of the popular game "2048". The aim is to merge tiles until you get a 2048 tile. Original game by @gabrielecirulli on http://gabrielecirulli.github.io/2048/ .
Lua
2
star
66

csv-fill-docx

Fill docx templates with data from a csv file
JavaScript
1
star
67

c-osi

C interface to Open Solver Interface solvers for easy integration with external solvers
C++
1
star
68

lovasoa.github.io

Github pages root
HTML
1
star
69

cemantriche

Cémantriche, pour tricher à cémantix
Jupyter Notebook
1
star
70

resume

Ophir LOJKINE's resume, in the jsonresume format.
1
star
71

lsystems

Haskell implementation of l systems
Haskell
1
star
72

libepeg.js

epeg compiled to javascript. Fast jpeg thumbnailing.
C
1
star
73

doublons-js

Find duplicates files in a folder (search files with similar names). GUI application with node-webkit.
JavaScript
1
star
74

ruzzleplayer

Javascript implementation of an algorithm that plays the mobile game "ruzzle"
JavaScript
1
star
75

ophir_odt_import

Drupal module to import odt (OpenDocument) files into drupal nodes.
PHP
1
star
76

kaigit

Svelte
1
star
77

srtmove-hs

Delay .srt subtitles
Haskell
1
star
78

find-candidate-keys

Finds the candidate keys from a list of functionnal dependencies
JavaScript
1
star
79

comptes

Expenses management for friends
JavaScript
1
star
80

poker

poker probabilities map
JavaScript
1
star
81

omodbus

OModbus is a GUI to interact with modbus devices
HTML
1
star
82

elm-fileinput

<input type="file" /> for Elm
Elm
1
star
83

elm-base-repo

Base repository containing an empty elm module, ready tobe forked.
Elm
1
star
84

dezoomify-browser

This is a small browser that automatically launches dezoomify when it meets a zoomified image. Uses node-webkit.
JavaScript
1
star
85

anylang

Type text in any alphabet. Anylang is a javascript library that converts a phonetic transcription of a text in a language to a text written in the alphabet of the target language. Currently works with hebrew (with vowels) and russian.
JavaScript
1
star
86

qrcode-dataset

A large dataset of partially damaged and distorted QR code images to be used for benchmarking scanning libraries and training new models
Python
1
star
87

srtmove-js

Short javascript program with an HTML interface to move subtitles (add or remove time) to an existing srt file.
JavaScript
1
star
88

dataiku-exercise

US census visualization web application. Given as an exercise by dataiku after a job interview. I didn't get the job.
Elm
1
star