• Stars
    star
    415
  • Rank 103,499 (Top 3 %)
  • Language
    C++
  • License
    Other
  • Created almost 9 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast and compact format for serialization and storage

VelocyPack (VPack) - a fast and compact format for serialization and storage

GitHub Action: Build Coveralls: Coverage Status

Motivation

These days, JSON (JavaScript Object Notation, see ECMA-404) is used in many cases where data has to be exchanged. Lots of protocols between different services use it, databases store JSON (document stores naturally, but others increasingly as well). It is popular, because it is simple, human-readable, and yet surprisingly versatile, despite its limitations.

At the same time there is a plethora of alternatives ranging from XML over Universal Binary JSON, MongoDB's BSON, MessagePack, BJSON (binary JSON), Apache Thrift till Google's protocol buffers and ArangoDB's shaped JSON.

When looking into this, we were surprised to find that none of these formats manages to combine compactness, platform independence, fast access to sub-objects and rapid conversion from and to JSON.

We have invented VPack because we need a binary format that

  • is self-contained and schemaless
  • is compact
  • is largely platform independent (see Portability)
  • covers all of JSON plus dates, integers, binary data and arbitrary precision numbers
  • can be used in a database kernel to access sub-documents for example for indexes, so it must be possible to access sub-documents (array and object members) efficiently
  • can be transferred to JSON and from JSON rapidly
  • avoids too many memory allocations
  • gives flexibility to assemble objects, such that sub-objects reside in the database in an unchanged way
  • allows to use an external table for frequently used attribute names
  • quickly allows to read off the type and length of a given object from its first byte(s)

All this gives us the possibility to use the same byte sequence of data for transport, storage and (read-only) work. Using a single data format not only eliminates a lot of conversions but can also reduce runtime memory usage, as data does only need a single in-memory representation.

The other popular formats we looked at have all some deficiency with respect to the above list. To name but a few:

  • JSON itself lacks some data types (dates and binary data) and does not provide quick sub-value access without parsing. Parsing JSON is also quite a challenge performance-wise
  • XML is not compact and is not good with binary data, it also lacks quick sub-value access
  • BSON gets quite a lot right with respect to data types, but is seriously lacking w.r.t. sub-value access. Furthermore, it is not very compact and quite wasteful space-wise when storing array values
  • Apache Thrift and Google's Protocol Buffers are not schemaless and self-contained. Their transport format is a serialization that is not good for rapid sub-value access
  • MessagePack is probably the closest to our shopping list. It has has decent data types and is quite compact. However, we found that one can do better in terms of compactness for some cases. More important for us, MessagePack provides no quick sub-value access
  • Our own shaped JSON (used in ArangoDB as internal storage format) has very quick sub-value access, but the shape data is kept outside the actual data, so the shaped values are not self-contained. Furthermore, we have run into scalability issues on multi-core because of the shared data structures used for interpretation of the values

Any new data format must be backed by C++ classes to allow

  • easy and fast parsing from JSON
  • easy and convenient buildup without too many memory allocations
  • fast access to data and its sub-objects (for arrays and objects)
  • flexible memory management
  • fast dumping to JSON

The VelocyPack format is an attempt to achieve all this.

This repository contains a C++ library for building, manipulating and serializing VPack data. It is the reference implementation for the VelocyPack format. The library is written in C++20 so it should compile on many up-to-date systems.

The VelocyPack format and library are used extensively in the ArangoDB database.

Specification

See the file VelocyPack.md for a detailed description of the VPack format.

Performance

See the file Performance.md for a thorough comparison to other formats like JSON itself, MessagePack and BSON. We look at file sizes as well as parsing and conversion performance.

Building the VPack library

The VPack library can be built on Linux, MacOS and Windows. It will likely compile and work on other platforms for which a recent version of cmake and a working C++20-enabled compiler are available.

See the file Install.md for compilation and installation instructions.

Using the VPack library

Please consult the file examples/API.md for usage examples, and the file examples/Embedding.md for information about how to embed the library into client applications.

Testing and validating with fuzzer

The fuzzer tool can be used to generate random VPack or JSON structures and validate them. The tool can be run with multiple iterations, parallelism, and a seed can be provided for the random generation. Please consult the file tools/README.md for usage information.

Contributing

We welcome bug fixes and patches from 3rd party contributors!

Please follow the guidelines in CONTRIBUTING.md if you want to contribute to VelocyPack. Have a look for the tag help wanted in the issue tracker!

We also provide a golang version of VPack in the go-velocypack repository and a Java version in the java-velocypack.

Additionally, there is a third party VPack implementation for PHP.

More Repositories

1

arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
C++
13,407
star
2

arangojs

The official ArangoDB JavaScript driver.
TypeScript
599
star
3

python-arango

The official ArangoDB Python driver.
Python
444
star
4

go-driver

The official ArangoDB go driver.
Go
338
star
5

kube-arangodb

ArangoDB Kubernetes Operator - Start ArangoDB on Kubernetes in 5min
Go
225
star
6

arangodb-java-driver

The official ArangoDB Java driver.
Java
200
star
7

arangodb-php

PHP ODM for ArangoDB
PHP
183
star
8

spring-data

Spring Data ArangoDB
Java
109
star
9

arangodb-docker

Docker container for ArangoDB
Shell
104
star
10

example-datasets

Demo Data for ArangoDB
JavaScript
90
star
11

interactive_tutorials

Repository for all ArangoDB interactive tutorial notebooks.
Jupyter Notebook
89
star
12

guesser

Building a self-learning game with ArangoDB, io.js/NodeJS & AngularJS in half a day.
JavaScript
89
star
13

aqbjs

ArangoDB AQL query builder [DEPRECATED]
JavaScript
46
star
14

foxx-cli

CLI for managing Foxx services.
JavaScript
44
star
15

arangodb-java-driver-async

ArangoDB Asynchronous Java driver
Java
42
star
16

fuerte

Low Level C++ Driver for ArangoDB
C++
42
star
17

foxx-apps

List of all public FOXX Applications for ArangoDB
37
star
18

spring-data-demo

Java
35
star
19

arangodb-spark-connector

Scala
33
star
20

deployment

Fast-deploy arangodb instances on popular cloud computing platforms
29
star
21

spring-boot-starter

Java
22
star
22

docs

âš  DEPRECATED! Source code of the legacy ArangoDB online documentation. See arangodb/docs-hugo
CSS
20
star
23

java-velocypack

Java
17
star
24

velocystream

Description of the Communication Protocol
15
star
25

arangodb-spark-datasource

ArangoDB Connector for Apache Spark, using the Spark DataSource API
Scala
14
star
26

cloud

ArangoGraph is the easiest way to run ArangoDB. Available on AWS, Google Cloud & Azure.
14
star
27

go-velocypack

Go
13
star
28

dashboards

Grafana dashboards for monitoring ArangoDB.
13
star
29

jackson-dataformat-velocypack

Java
13
star
30

1mDocsPerSec

Database latency and throughput tests for ArangoDB
C++
10
star
31

careers

Welcome to the ArangoDB Careers repository! These are the current open positions at ArangoDB. If you want to join us on this great journey, this is the right place to start.
10
star
32

release-test-automation

Python
9
star
33

python-arango-async

Python
7
star
34

feed

A tool to feed generated random data of various types into ArangoDB and run other load
Go
7
star
35

oskar

Shell
7
star
36

arangodb-java-reactive-driver

DRAFT VERSION
Java
6
star
37

java-velocypack-module-scala

Scala
6
star
38

docs-hugo

Source code of the ArangoDB online documentation
Python
5
star
39

foxx-ui-template

Minimalistic foxx ui template using pure css
CSS
5
star
40

arangodb-dcos

DCOS CLI for ArangoDB
Python
5
star
41

java-velocypack-module-jdk8

Java
5
star
42

cluster-maintenance

These scripts are to be used with caution, under the guidance of ArangoDB support.
JavaScript
4
star
43

simple-performance-test

A collection of performance tests that test various aspects of ArangoDB.
JavaScript
4
star
44

graph-importer

Import graphs
Python
3
star
45

node-arangodb-cxx

ArangoDB node API in C++
C++
3
star
46

nx-arangodb

The ArangoDB backend to NetworkX
Python
3
star
47

gobench

Some parallel benchmarks in Go for ArangoDB
Go
3
star
48

kafka-connect-arangodb

Kafka Connect ArangoDB Sink Connector
Java
3
star
49

simple-java-performance-test

Java
2
star
50

clang-format-action

A Github Action to check for clang-format lint errors
Shell
2
star
51

jmeter

JMeter samples for ArangoDB
2
star
52

demo-aql-g6

React JS application using AntV G6 library to visualize of AQL query results
JavaScript
1
star
53

go-upgrade-rules

Code to describe which ArangoDB upgrades and downgrades are allowed
Go
1
star
54

node-velocypack

C++
1
star
55

arangodb-cleanup-framework

Cleanup helper framework
C++
1
star
56

resilience-tests

ArangoDB resilience tests
JavaScript
1
star
57

Neo4j-Arango-Migration

Jupyter Notebook
1
star
58

java-velocypack-module-joda

Java
1
star
59

upgrade-data-tests

Contains test data for upgrade tests
JavaScript
1
star
60

arangosync-client

Client Go-Library to communicate with ArangoSync processes.
Go
1
star
61

windows-procdump-wrapper

Wrapper around procdump that will put the dump relative to the executable instead of a fixed directory
C++
1
star
62

java-resilience-tests

Java
1
star
63

arangodb-spark-example

Scala
1
star
64

rta-makedata

JavaScript
1
star