• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created about 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.

JSON tricks (python)

The [pyjson-tricks] package brings several pieces of functionality to python handling of json files:

  1. Store and load numpy arrays in human-readable format.
  2. Store and load class instances both generic and customized.
  3. Store and load date/times as a dictionary (including timezone).
  4. Preserve map order {} using OrderedDict.
  5. Allow for comments in json files by starting lines with #.
  6. Sets, complex numbers, Decimal, Fraction, enums, compression, duplicate keys, pathlib Paths, bytes ...

As well as compression and disallowing duplicate keys.

Several keys of the format __keyname__ have special meanings, and more might be added in future releases.

If you're considering JSON-but-with-comments as a config file format, have a look at HJSON, it might be more appropriate. For other purposes, keep reading!

Thanks for all the Github stars!

Installation and use

You can install using

pip install json-tricks

Decoding of some data types needs the corresponding package to be installed, e.g. numpy for arrays, pandas for dataframes and pytz for timezone-aware datetimes.

You can import the usual json functions dump(s) and load(s), as well as a separate comment removal function, as follows:

from json_tricks import dump, dumps, load, loads, strip_comments

The exact signatures of these and other functions are in the documentation.

Quite some older versions of Python are supported. For an up-to-date list see the automated tests.

Features

Numpy arrays

When not compressed, the array is encoded in sort-of-readable and very flexible and portable format, like so:

arr = arange(0, 10, 1, dtype=uint8).reshape((2, 5))
print(dumps({'mydata': arr}))

this yields:

{
    "mydata": {
        "dtype": "uint8",
        "shape": [2, 5],
        "Corder": true,
        "__ndarray__": [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
    }
}

which will be converted back to a numpy array when using json_tricks.loads. Note that the memory order (Corder) is only stored in v3.1 and later and for arrays with at least 2 dimensions.

As you see, this uses the magic key __ndarray__. Don't use __ndarray__ as a dictionary key unless you're trying to make a numpy array (and know what you're doing).

Numpy scalars are also serialized (v3.5+). They are represented by the closest python primitive type. A special representation was not feasible, because Python's json implementation serializes some numpy types as primitives, without consulting custom encoders. If you want to preserve the exact numpy type, use encode_scalars_inplace.

There is also a compressed format. From the next major release, this will be default when using compression. For now you can use it as:

dumps(data, compression=True, properties={'ndarray_compact': True})

This compressed format encodes the array data in base64, with gzip compression for the array, unless 1) compression has little effect for that array, or 2) the whole file is already compressed. If you only want compact format for large arrays, pass the number of elements to ndarray_compact.

Example:

data = [linspace(0, 10, 9), array([pi, exp(1)])]
dumps(data, compression=False, properties={'ndarray_compact': 8})

[{
   "__ndarray__": "b64.gz:H4sIAAAAAAAC/2NgQAZf7CE0iwOE5oPSIlBaEkrLQegGRShfxQEAz7QFikgAAAA=",
   "dtype": "float64",
   "shape": [9]
 }, {
   "__ndarray__": [3.141592653589793, 2.718281828459045],
   "dtype": "float64",
   "shape": [2]
 }]

Class instances

json_tricks can serialize class instances.

If the class behaves normally (not generated dynamic, no __new__ or __metaclass__ magic, etc) and all it's attributes are serializable, then this should work by default.

# json_tricks/test_class.py
class MyTestCls:
def __init__(self, **kwargs):
    for k, v in kwargs.items():
        setattr(self, k, v)

cls_instance = MyTestCls(s='ub', dct={'7': 7})

json = dumps(cls_instance, indent=4)
cls_instance_again = loads(json)

You'll get your instance back. Here the json looks like this:

{
   	"__instance_type__": [
   		"json_tricks.test_class",
   		"MyTestCls"
   	],
   	"attributes": {
   		"s": "ub",
   		"dct": {
   			"7": 7
   		}
   	}
}

As you can see, this stores the module and class name. The class must be importable from the same module when decoding (and should not have changed). If it isn't, you have to manually provide a dictionary to cls_lookup_map when loading in which the class name can be looked up. Note that if the class is imported, then globals() is such a dictionary (so try loads(json, cls_lookup_map=glboals())). Also note that if the class is defined in the 'top' script (that you're calling directly), then this isn't a module and the import part cannot be extracted. Only the class name will be stored; it can then only be deserialized in the same script, or if you provide cls_lookup_map.

Note that this also works with slots without having to do anything (thanks to koffie and dominicdoty), which encodes like this (custom indentation):

{
    "__instance_type__": ["module.path", "ClassName"],
    "slots": {"slotattr": 37},
    "attributes": {"dictattr": 42}
}

If the instance doesn't serialize automatically, or if you want custom behaviour, then you can implement __json__encode__(self) and __json_decode__(self, **attributes) methods, like so:

class CustomEncodeCls:
def __init__(self):
    self.relevant = 42
    self.irrelevant = 37

    def __json_encode__(self):
    # should return primitive, serializable types like dict, list, int, string, float...
    return {'relevant': self.relevant}

    def __json_decode__(self, **attrs):
    # should initialize all properties; note that __init__ is not called implicitly
    self.relevant = attrs['relevant']
    self.irrelevant = 12

As you've seen, this uses the magic key __instance_type__. Don't use __instance_type__ as a dictionary key unless you know what you're doing.

Date, time, datetime and timedelta

Date, time, datetime and timedelta objects are stored as dictionaries of "day", "hour", "millisecond" etc keys, for each nonzero property.

Timezone name is also stored in case it is set. You'll need to have pytz installed to use timezone-aware date/times, it's not needed for naive date/times.

{
    "__datetime__": null,
    "year": 1988,
    "month": 3,
    "day": 15,
    "hour": 8,
    "minute": 3,
    "second": 59,
    "microsecond": 7,
    "tzinfo": "Europe/Amsterdam"
}

This approach was chosen over timestamps for readability and consistency between date and time, and over a single string to prevent parsing problems and reduce dependencies. Note that if primitives=True, date/times are encoded as ISO 8601, but they won't be restored automatically.

Don't use __date__, __time__, __datetime__, __timedelta__ or __tzinfo__ as dictionary keys unless you know what you're doing, as they have special meaning.

Order

Given an ordered dictionary like this (see the tests for a longer one):

ordered = OrderedDict((
    ('elephant', None),
    ('chicken', None),
    ('tortoise', None),
))

Converting to json and back will preserve the order:

from json_tricks import dumps, loads
json = dumps(ordered)
ordered = loads(json, preserve_order=True)

where preserve_order=True is added for emphasis; it can be left out since it's the default.

As a note on performance, both dicts and OrderedDicts have the same scaling for getting and setting items (O(1)). In Python versions before 3.5, OrderedDicts were implemented in Python rather than C, so were somewhat slower; since Python 3.5 both are implemented in C. In summary, you should have no scaling problems and probably no performance problems at all, especially in Python 3. Python 3.6+ preserves order of dictionaries by default making this redundant, but this is an implementation detail that should not be relied on.

Comments

Warning: in the next major version, comment parsing will be opt-in, not default anymore (for performance reasons). Update your code now to pass ignore_comments=True explicitly if you want comment parsing.

This package uses # and // for comments, which seem to be the most common conventions, though only the latter is valid javascript.

For example, you could call loads on the following string:

{ # "comment 1 "hello": "Wor#d", "Bye": ""M#rk"", "yes\"": 5,# comment" 2 "quote": ""th#t's" what she said", // comment "3" "list": [1, 1, "#", """, "\", 8], "dict": {"q": 7} #" comment 4 with quotes } // comment 5

And it would return the de-commented version:

{
    "hello": "Wor#d", "Bye": "\"M#rk\"", "yes\\\"": 5,
    "quote": "\"th#t's\" what she said",
    "list": [1, 1, "#", "\"", "\\", 8], "dict": {"q": 7}
}

Since comments aren't stored in the Python representation of the data, loading and then saving a json file will remove the comments (it also likely changes the indentation).

The implementation of comments is a bit crude, which means that there are some exceptional cases that aren't handled correctly (#57).

It is also not very fast. For that reason, if ignore_comments wasn't explicitly set to True, then json-tricks first tries to parge without ignoring comments. If that fails, then it will automatically re-try with comment handling. This makes the no-comment case faster at the cost of the comment case, so if you are expecting comments make sure to set ignore_comments to True.

Other features

  • Special floats like NaN, Infinity and -0 using the allow_nan=True argument (non-standard json, may not decode in other implementations).
  • Sets are serializable and can be loaded. By default the set json representation is sorted, to have a consistent representation.
  • Save and load complex numbers (py3) with 1+2j serializing as {'__complex__': [1, 2]}.
  • Save and load Decimal and Fraction (including NaN, infinity, -0 for Decimal).
  • Save and load Enum (thanks to Jenselme), either built-in in python3.4+, or with the enum34 package in earlier versions. IntEnum needs encode_intenums_inplace.
  • json_tricks allows for gzip compression using the compression=True argument (off by default).
  • json_tricks can check for duplicate keys in maps by setting allow_duplicates to False. These are kind of allowed, but are handled inconsistently between json implementations. In Python, for dict and OrderedDict, duplicate keys are silently overwritten.
  • Save and load pathlib.Path objects (e.g., the current path, Path('.'), serializes as {"__pathlib__": "."}) (thanks to bburan).
  • Save and load bytes (python 3+ only), which will be encoded as utf8 if that is valid, or as base64 otherwise. Base64 is always used if primitives are requested. Serialized as [{"__bytes_b64__": "aGVsbG8="}] vs [{"__bytes_utf8__": "hello"}].
  • Save and load slices (thanks to claydugo).

Preserve type vs use primitive

By default, types are encoded such that they can be restored to their original type when loaded with json-tricks. Example encodings in this documentation refer to that format.

You can also choose to store things as their closest primitive type (e.g. arrays and sets as lists, decimals as floats). This may be desirable if you don't care about the exact type, or you are loading the json in another language (which doesn't restore python types). It's also smaller.

To forego meta data and store primitives instead, pass primitives to dump(s). This is available in version 3.8 and later. Example:

data = [
    arange(0, 10, 1, dtype=int).reshape((2, 5)),
    datetime(year=2017, month=1, day=19, hour=23, minute=00, second=00),
    1 + 2j,
    Decimal(42),
    Fraction(1, 3),
    MyTestCls(s='ub', dct={'7': 7}),  # see later
    set(range(7)),
]
# Encode with metadata to preserve types when decoding
print(dumps(data))
// (comments added and indenting changed)
[
    // numpy array
    {
        "__ndarray__": [
            [0, 1, 2, 3, 4],
            [5, 6, 7, 8, 9]],
        "dtype": "int64",
        "shape": [2, 5],
        "Corder": true
    },
    // datetime (naive)
    {
        "__datetime__": null,
        "year": 2017,
        "month": 1,
        "day": 19,
        "hour": 23
    },
    // complex number
    {
        "__complex__": [1.0, 2.0]
    },
    // decimal & fraction
    {
        "__decimal__": "42"
    },
    {
        "__fraction__": true
        "numerator": 1,
        "denominator": 3,
    },
    // class instance
    {
        "__instance_type__": [
          "tests.test_class",
          "MyTestCls"
        ],
        "attributes": {
          "s": "ub",
          "dct": {"7": 7}
        }
    },
    // set
    {
        "__set__": [0, 1, 2, 3, 4, 5, 6]
    }
]
# Encode as primitive types; more simple but loses type information
print(dumps(data, primitives=True))
// (comments added and indentation changed)
[
    // numpy array
    [[0, 1, 2, 3, 4],
    [5, 6, 7, 8, 9]],
    // datetime (naive)
    "2017-01-19T23:00:00",
    // complex number
    [1.0, 2.0],
    // decimal & fraction
    42.0,
    0.3333333333333333,
    // class instance
    {
        "s": "ub",
        "dct": {"7": 7}
    },
    // set
    [0, 1, 2, 3, 4, 5, 6]
]

Note that valid json is produced either way: json-tricks stores meta data as normal json, but other packages probably won't interpret it.

Note that valid json is produced either way: json-tricks stores meta data as normal json, but other packages probably won't interpret it.

Usage & contributions

Code is under Revised BSD License so you can use it for most purposes including commercially.

Contributions are very welcome! Bug reports, feature suggestions and code contributions help this project become more useful for everyone! There is a short contribution guide.

Contributors not yet mentioned: janLo (performance boost).

Tests

Tests are run automatically for commits to the repository for all supported versions. This is the status:

image

To run the tests manually for your version, see this guide.

More Repositories

1

array_storage_benchmark

Compare some methods of array storage in Python (numpy)
Python
157
star
2

kotlin_multiplatform_gradle_demo

My attempt to get Gradle to work with multiplatform Kotlin while using subprojects.
Kotlin
27
star
3

rusht

Utility scripts written in Rust
Rust
5
star
4

java-result

Algebraic Result type in Java 15+, return type for oprations that can fail
Java
5
star
5

django_display_exception

Python
4
star
6

django_minimal_log

Simple logging server and clients
Python
4
star
7

file_shred

Secure file deletion from the command line
Rust
4
star
8

wasm-representation-in-rust

Represent the wasm abstract syntax as Rust objects that render to wasm (bin) or wat (text)
Rust
3
star
9

fileenc_openssl

This code allows one to easily encrypt and decrypt files symmetrically using openssl and python3.
Python
3
star
10

tilde

Rust
2
star
11

flex_size_int

Binary encoding for integers that uses a flexible number of bytes to save space
2
star
12

aqua

JavaScript
2
star
13

voronoi

Voronoi colored faces using efficient nearest neighbour search instead of graph math
Rust
2
star
14

ci_util

CI images and tools
Dockerfile
2
star
15

imgarray

Save and load numpy arrays as PNG images
Python
2
star
16

django_mock_rest

Simple way to create static mock data at rest api endpoints in the Django admin
Python
1
star
17

prover

Rust
1
star
18

typed_index_vec

Rust
1
star
19

scoped_name

Rust
1
star
20

apivolve

An API evolution tool that helps keep your APIs backwards compatible yet clean
Rust
1
star
21

quantum_scattering

Atom-diatom time-dependent wavepacket quantum scattering code
1
star
22

file_endec

Secure file encryption and decryption from the command line (Rust)
Rust
1
star
23

vcpm

A language-agnostic package manager that relies heavily on VCS infrastructure
1
star
24

silk

Cross-database relational database schema format (json)
Java
1
star
25

block_comments

Firefox addon that removes comments on a select number of websites where they're known to be particularly bad.
Shell
1
star
26

quora_unfade

Firefox addon that removes the fading effect and login box that Quora forces upon its visitors
CSS
1
star
27

number2name

Rust
1
star
28

mwe_ws_rs_connectivity_issues

Rust
1
star
29

programming-languages

A list that will hopefully one day contain all programming languages!
CSS
1
star
30

binary_json

Binary encoding of JSON that emphasizes compression
1
star
31

gongzuo

Kotlin
1
star
32

py_file_cache_decorator

A fairly simple decorator to cache function return values to memory and files for a specific time.
Python
1
star
33

silk_sql_gen

Generate data definition SQL statements based on a Silk schema
Java
1
star
34

k8s_host_db

Small demo for how to reach a DB outside your k8s cluster from inside (on localhost)
Python
1
star
35

dockerfile_version_bumper

Automatically bump Dockerfile FROM statements to use the latest version
Rust
1
star
36

typesafe_direct_rest_api

RPC implementation - a way to construct a rest api that is type-safe and can be used over network or as native calls
Rust
1
star
37

brocast

Keylane Hackathon entry: chatting without the clutter of ascii!
Java
1
star
38

rust_compile_speed_in_docker

Compare the compile speed of Rust natively and in Docker
Shell
1
star
39

atadb

atadb will be a simple but fast relational database that supports a subset of SQL.
Rust
1
star
40

rust_template

My default Rust setup to base new projects on
Rust
1
star