• Stars
    star
    157
  • Rank 238,399 (Top 5 %)
  • Language
    Python
  • Created about 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Compare some methods of array storage in Python (numpy)

Array storage benchmark

Compare the storage speed, retrieval speed and file size for various methods of storing 2D numpy arrays.

Hardware etc

The results here are obtained on a normal desktop PC that's several years old and running Ubuntu and has a SSD for storage. You can easily run the benchmarks on your own PC to get more relevant results. You can also apply it to your own data.

Methods

Name Description Fast Small^ Portability Ease of use Human-readable Flexible% Notes
Csv~ comma separated value ☐ ☐ ☐ ☐ ☐ ☐ β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ ☐ only 2D
JSON~ js object notation ☐ ☐ ☐ ☐ ☐ ☐ β˜’ β˜’ ☐ β˜’ β˜’ ☐ ++ β˜’ ☐ β˜’ β˜’ any dim, unequal rows
b64Enc base 64 encoding β˜’ β˜’ β˜’ β˜’ ☐ ☐ β˜’ β˜’ ☐ β˜’ β˜’ ☐ ☐ ☐ ☐ ☐ more network, not files
JsonTricks json-tricks compact β˜’ β˜’ ☐ β˜’ β˜’ ☐ β˜’ ☐ ☐ β˜’ β˜’ β˜’ + ☐ ☐ β˜’ β˜’ many types beyond numpy
MsgPack Binary version of json β˜’ β˜’ β˜’ β˜’ β˜’ ☐ β˜’ ☐ ☐ β˜’ β˜’ ☐ + ☐ ☐ β˜’ ☐ Β 
Pickle~ python pickle β˜’ β˜’ ☐ ☐ ☐ ☐ ☐ ☐ ☐ β˜’ β˜’ β˜’ ☐ ☐ β˜’ β˜’ any obj, not backw. comp
Binary~ pure raw data β˜’ β˜’ β˜’ β˜’ β˜’ ☐ β˜’ β˜’ β˜’ β˜’ ☐ ☐ ☐ ☐ ☐ ☐ dim & type separately
NPY numpy .npy (no pickle) β˜’ β˜’ β˜’ β˜’ β˜’ ☐ β˜’ ☐ ☐ β˜’ β˜’ β˜’ ☐ ☐ β˜’ ☐ with pickle mode OFF
NPYCompr numpy .npz β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ β˜’ ☐ ☐ β˜’ β˜’ β˜’ ☐ ☐ β˜’ ☐ multiple matrices
PNG encoded as png image β˜’ β˜’ ☐ β˜’ β˜’ β˜’ ☐ ☐ ☐ ☐ ☐ ☐ ++ ☐ ☐ ☐ ☐ only 2D; for fun but works
FortUnf fortran unformatted β˜’ β˜’ β˜’ β˜’ β˜’ ☐ β˜’ ☐ ☐ β˜’ ☐ ☐ + ☐ ☐ β˜’ ☐ often compiler dependent
MatFile Matlab .mat file β˜’ β˜’ β˜’ β˜’ β˜’ ☐ β˜’ β˜’ ☐ β˜’ β˜’ β˜’ + ☐ ☐ β˜’ ☐ multiple matrices
  • ^ Two checks if it's small for dense data, three checks if also for sparse. All gzipped results are small for sparse data.
  • % E.g. easily supports 3D or higher arrays, unequal columns, inhomogeneous type columns...
  • ~ Also tested with gzip, stats refer to non-gzipped. Gzipped is always much slower to write, a bit slower to read, for text formats it's at least 50% smaller.
    • Rating refers to using a semi-popular package (probably scipy), as opposed to only python and numpy.
  • ++ Very easy (β˜’β˜’β˜’) with an unpopular and/or dedicated package, but the rating refers to only python and numpy.

You can install all dependencies using pip install -r requirements.pip. csv and NPY were done with numpy; json and compact json (JsonTricks) were done with pyjson_tricks; png was done with imgarray; fortran unformatted and matlab were done with scipy; pickle, base64 and gzipping were done with python built-ins. HDF5 uses h5py (not finished, see issue4). MessagePack uses msgpack-numpy. Seaborn is needed for plotting. You can install all dependencies using pip install requirements.pip

Results

Dense random matrix

https://raw.githubusercontent.com/mverleg/array_storage_benchmark/master/result/bm_random.png

https://raw.githubusercontent.com/mverleg/array_storage_benchmark/master/result/bm_long.png

Sparse random matrix

99% of values are zero, so compression ratios are very good.

https://raw.githubusercontent.com/mverleg/array_storage_benchmark/master/result/bm_sparse.png

Real data

Scattering probabilities for hydrogen and carbon monoxide (many doubles between 0 and 1, most close to 0). You can easily overwrite this by your own file in testdata.csv.

https://raw.githubusercontent.com/mverleg/array_storage_benchmark/master/result/bm_example.png

More methods

Pull requests with other methods (serious or otherwise) are welcome! There might be some ideas in the issue tracker.

More Repositories

1

pyjson_tricks

Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.
Python
153
star
2

kotlin_multiplatform_gradle_demo

My attempt to get Gradle to work with multiplatform Kotlin while using subprojects.
Kotlin
27
star
3

rusht

Utility scripts written in Rust
Rust
5
star
4

java-result

Algebraic Result type in Java 15+, return type for oprations that can fail
Java
5
star
5

django_display_exception

Python
4
star
6

django_minimal_log

Simple logging server and clients
Python
4
star
7

file_shred

Secure file deletion from the command line
Rust
4
star
8

wasm-representation-in-rust

Represent the wasm abstract syntax as Rust objects that render to wasm (bin) or wat (text)
Rust
3
star
9

fileenc_openssl

This code allows one to easily encrypt and decrypt files symmetrically using openssl and python3.
Python
3
star
10

tilde

Rust
2
star
11

flex_size_int

Binary encoding for integers that uses a flexible number of bytes to save space
2
star
12

aqua

JavaScript
2
star
13

voronoi

Voronoi colored faces using efficient nearest neighbour search instead of graph math
Rust
2
star
14

ci_util

CI images and tools
Dockerfile
2
star
15

imgarray

Save and load numpy arrays as PNG images
Python
2
star
16

django_mock_rest

Simple way to create static mock data at rest api endpoints in the Django admin
Python
1
star
17

prover

Rust
1
star
18

typed_index_vec

Rust
1
star
19

scoped_name

Rust
1
star
20

apivolve

An API evolution tool that helps keep your APIs backwards compatible yet clean
Rust
1
star
21

quantum_scattering

Atom-diatom time-dependent wavepacket quantum scattering code
1
star
22

file_endec

Secure file encryption and decryption from the command line (Rust)
Rust
1
star
23

vcpm

A language-agnostic package manager that relies heavily on VCS infrastructure
1
star
24

silk

Cross-database relational database schema format (json)
Java
1
star
25

block_comments

Firefox addon that removes comments on a select number of websites where they're known to be particularly bad.
Shell
1
star
26

quora_unfade

Firefox addon that removes the fading effect and login box that Quora forces upon its visitors
CSS
1
star
27

number2name

Rust
1
star
28

mwe_ws_rs_connectivity_issues

Rust
1
star
29

programming-languages

A list that will hopefully one day contain all programming languages!
CSS
1
star
30

binary_json

Binary encoding of JSON that emphasizes compression
1
star
31

gongzuo

Kotlin
1
star
32

py_file_cache_decorator

A fairly simple decorator to cache function return values to memory and files for a specific time.
Python
1
star
33

silk_sql_gen

Generate data definition SQL statements based on a Silk schema
Java
1
star
34

k8s_host_db

Small demo for how to reach a DB outside your k8s cluster from inside (on localhost)
Python
1
star
35

dockerfile_version_bumper

Automatically bump Dockerfile FROM statements to use the latest version
Rust
1
star
36

typesafe_direct_rest_api

RPC implementation - a way to construct a rest api that is type-safe and can be used over network or as native calls
Rust
1
star
37

brocast

Keylane Hackathon entry: chatting without the clutter of ascii!
Java
1
star
38

rust_compile_speed_in_docker

Compare the compile speed of Rust natively and in Docker
Shell
1
star
39

atadb

atadb will be a simple but fast relational database that supports a subset of SQL.
Rust
1
star
40

rust_template

My default Rust setup to base new projects on
Rust
1
star