• Stars
    star
    349
  • Rank 121,528 (Top 3 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Very fast Python JSON parsing library

cysimdjson

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
It is Python bindings for the simdjson using Cython.

Standard Python JSON parser (json.load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

Whilst there are other fast Python JSON parsers, such as pysimdjson, libpy_simdjson or orjson, they don't reach the raw speed that is provided by the brilliant SIMDJSON project. SIMDJSON is C++ JSON parser based on SIMD instructions, reportedly the fastest JSON parser on the planet.

Test in Python 3.7 Test in Python 3.8 Test in Python 3.9

Usage

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes)

# Access using JSON Pointer
print(json_element.at_pointer("/foo/2/0"))

Note: parser object can be reused for maximum performance.

Pythonic drop-in API

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access using JSON Pointer
print(json_parsed.json_parsed['foo'])

The json_parsed is a read-only dictionary-like object, that provides an access to JSON data.

Trade-offs

The speed of cysimdjson is based on these assumptions:

  1. The output of the parser is read-only, you cannot modify it
  2. The output of the parser is not Python dictionary, but lazily evaluated dictionary-like object
  3. If you convert the parser output into a Python dictionary, you will lose the speed

If your design is not aligned with these assumptions, cysimdjson is not a good choice.

Documentation

JSONParser.parse(json_bytes)

Parse JSON json_bytes, represented as bytes.

JSONParser.parse_in_place(bytes)

Parse JSON json_bytes, represented as bytes, assuming that there is a padding expected by SIMDJSON. This is the fastest parsing variant.

JSONParser.parse_string(string)

Parse JSON json_bytes, represented as str (string).

JSONParser.load(path)

Installation

pip3 install cysimdjson

Project cysimdjson is distributed via PyPI: https://pypi.org/project/cysimdjson/ .

If you want to install cysimdjson from source, you need to install Cython first: pip3 install cython.

Performance

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          510291.81 EPS (  1.00)  1223.17 MB/s
* libpy_simdjson loads      374615.54 EPS (  1.36)   897.95 MB/s
* pysimdjson parse          362195.46 EPS (  1.41)   868.18 MB/s
* orjson loads              110615.70 EPS (  4.61)   265.15 MB/s
* python json loads          72096.80 EPS (  7.08)   172.82 MB/s
----------------------------------------------------------------

SIMDJSON: 543335.93 EPS, 1241.52 MB/s
----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2556.10 EPS (  1.00)  1614.22 MB/s
* libpy_simdjson loads        2444.53 EPS (  1.05)  1543.76 MB/s
* pysimdjson parse            2415.46 EPS (  1.06)  1525.40 MB/s
* orjson loads                 387.11 EPS (  6.60)   244.47 MB/s
* python json loads            278.63 EPS (  9.17)   175.96 MB/s
----------------------------------------------------------------

SIMDJSON: 2536.16 EPS,  1527.28 MB/s
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             284.67 EPS (  1.00)   640.81 MB/s
* pysimdjson parse             284.62 EPS (  1.00)   640.70 MB/s
* libpy_simdjson loads         277.13 EPS (  1.03)   623.84 MB/s
* orjson loads                  81.80 EPS (  3.48)   184.13 MB/s
* python json loads             22.52 EPS ( 12.64)    50.68 MB/s
----------------------------------------------------------------

SIMDJSON: 307.95 EPS, 661.08 MB/s
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             775.61 EPS (  1.00)  2581.09 MB/s
* pysimdjson parse             743.67 EPS (  1.04)  2474.81 MB/s
* libpy_simdjson loads         654.15 EPS (  1.19)  2176.88 MB/s
* orjson loads                 166.67 EPS (  4.65)   554.66 MB/s
* python json loads            113.72 EPS (  6.82)   378.43 MB/s
----------------------------------------------------------------

SIMDJSON: 703.59 EPS, 2232.92 MB/s
----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         3972376.53 EPS (  1.00)    27.81 MB/s
* orjson loads             3637369.63 EPS (  1.09)    25.46 MB/s
* libpy_simdjson loads     1774211.19 EPS (  2.24)    12.42 MB/s
* pysimdjson parse          977530.90 EPS (  4.06)     6.84 MB/s
* python json loads         527932.65 EPS (  7.52)     3.70 MB/s
----------------------------------------------------------------

SIMDJSON: 3799392.10 EPS

CPU: AMD EPYC 7452

More performance testing:

Tests are reproducible

pip3 install orjson
pip3 install pysimdjson
pip3 install libpy_simdjson
python3 setup.py build_ext --inplace
PYTHONPATH=. python3 ./perftest/test_benchmark.py

Manual build

python3 setup.py build_ext --inplace

More Repositories

1

asab

Asynchronous Server App Boilerplate (ASAB) is a micro-service framework for Python 3 and asyncio.
HTML
29
star
2

CatVision-io-SDK-Android

Use CatVision.io SDK to add screen sharing of your Android application.
Java
17
star
3

Frame-Transporter

The Frame Transporter (aka libft) is an event-driven high-performance networking library for C and POSIX.
C
13
star
4

seacat-auth

SeaCat Auth provides authentication, authorization, identity management, session management and other access control features.
Python
11
star
5

c-its-itss

C-ITS ITS-S reference implementation focused on a security meant for testing and studying of TeskaLabs SeaCat CA API
Python
9
star
6

coingame

A fun game that illustrates the usage of asynchronous Python via the blockchain mining.
Python
5
star
7

CatVision-io-SDK-iOS

Use CatVision.io SDK to add screen sharing of your iOS application.
C
5
star
8

seacat-auth-webui

User interface for TeskaLabs SeaCat Auth
JavaScript
2
star
9

SeaCatTutorials

SeaCat Tutorials
JavaScript
2
star
10

SeaCat-Volley-Android

SeaCat bridge for Google Volley on Android
Java
1
star
11

SeaCat-Client-Python3

TeskaLabs SeaCat client for Python3 - high performance networking & cyber security
Python
1
star
12

seacat-admin-webui

User Interface for TeskaLabs SeaCat Administration
JavaScript
1
star
13

asab-iris

ASAB Iris is a microservice for rendering documents and sending them using email, SMS and instant messaging
Python
1
star
14

asab-webui

React-based web UI for ASAB.
JavaScript
1
star
15

go-asab

Asynchronous Server App Boilerplate (ASAB) is a micro-service framework for Go.
Go
1
star
16

SeaCat-Hessian-Android

SeaCat bridge for Hessian on Android
Java
1
star
17

SeaCat.io-Agent

The agent application for SeaCat.io - IoT Device management
C
1
star
18

splang-docs

SP-Lang documentation
Python
1
star