µjson is a a small, C++11, UTF-8, JSON library.
Its highlights are:
- Small library with very simple API
- Outputs nicely formatted JSON
- Fast UTF-8 conformant parser
- Liberal license
The library uses the double-conversion library from the V8 JavaScript engine for portable conversion between ASCII and floating point numbers. An amalgamation of v1.1.5 of this library is included in the source distribution.
Unit tests are written using Catch. Catch is also included in the source distribution.
The scanner is generated using re2c. The source distribution includes the generated file, so this tool is only needed if you intend to modify the scanner.
µjson is licensed under the MIT license. See the LICENSE.md
file in the
source distribution.
The dependencies all have liberal software licenses. See
LICENSE-3RD-PARTY.md
for the details.
The library, examples, and unit tests can be built using
CMake. The CMake scripts will automatically
download re2c
.
When using the library in another project, rather than using CMake, it may be easier to simply include the four source files,
ujson.hpp
ujson.cpp
double-conversion.h
double-conversion.cc,
directly in the project.
Consider representing books defined using this simple struct as JSON:
struct book_t {
std::string title;
std::vector<std::string> authors;
int year;
};
The first step is to write a small function for converting a book into a
ujson::value
:
ujson::value to_json(book_t const &b) {
return ujson::object{ { "title", b.title },
{ "authors", b.authors },
{ "year", b.year } };
}
Using the above function an array of books can be converted to JSON as follows:
book_t book1{ "Elements of Programming",
2009,
{ "Alexander A. Stepanov", "Paul McJones" } };
book_t book2{ "The C++ Programming Language, 4th Edition",
2013,
{ "Bjarne Stroustrup" } };
std::vector<book_t> book_list{ book1, book2 };
ujson::value value{ book_list };
std::string json = to_string(value);
std::cout << json << std::endl;
The last line will print:
[
{
"authors" : [
"Alexander A. Stepanov",
"Paul McJones"
],
"title" : "Elements of Programming",
"year" : 2009
},
{
"authors" : [
"Bjarne Stroustrup"
],
"title" : "The C++ Programming Language, 4th Edition",
"year" : 2013
}
]
Reconstructing the list of books is done by first parsing the JSON
string into a ujson::value
:
ujson::value new_value = ujson::parse(json);
assert(new_value == value);
Each element in this array is then converted to a book_t
:
std::vector<ujson::value> array = array_cast(std::move(new_value));
std::vector<book_t> new_book_list;
new_book_list.reserve(array.size());
for (auto it = array.begin(); it != array.end(); ++it)
new_book_list.push_back(make_book(std::move(*it)));
assert(new_book_list == book_list);
The helper function make_book
is implemented as follows:
book_t make_book(ujson::value v) {
if (!v.is_object())
throw std::invalid_argument("object expected for make_book");
book_t book;
std::vector<std::pair<std::string, ujson::value>> object =
object_cast(std::move(v));
auto it = find(object, "title");
if (it == object.end() || !it->second.is_string())
throw std::invalid_argument("'title' with type string not found");
book.title = string_cast(std::move(it->second));
it = find(object, "authors");
if (it == object.end() || !it->second.is_array())
throw std::invalid_argument("'authors' with type array not found");
std::vector<ujson::value> array = array_cast(std::move(it->second));
book.authors.reserve(array.size());
for (auto it = array.begin(); it != array.end(); ++it) {
if (!it->is_string())
throw std::invalid_argument("'authors' must be array of strings");
book.authors.push_back(string_cast(std::move(*it)));
}
it = find(object, "year");
if (it == object.end() || !it->second.is_number())
throw std::invalid_argument("'year' with type number not found");
book.year = int32_cast(it->second);
return book;
}
A JSON value must be null, a boolean, a number, a string, an array, or
an object (see RFC7159). In
µjson the class ujson::value
is used to represent all of these six
types.
The actual type of a value can queried using ujson::value::type
or
using one of the convenience methods, such as ujson::value::is_null
.
Values always contain one of the six possible types (ujson::value
does not have a special uninitialized state).
The class ujson::value
is a proper immutable value. Therefore, once
a value has been created, it cannot be changed, though of course it
can be assigned a new value. Values can be compared for equality and
inequality.
Casts are used to extract the embedded type again. For instance
bool_cast
is used to extract the bool
from values with boolean
types. If the value is cast to a wrong type a bad_cast exception is
thrown.
Default constructed values are null:
ujson::value null_value; // null
A constant null
value is defined in the ujson
namespace.
assert(ujson::null == null_value);
Values support stream i/o:
std::cout << null_value << std::endl; // prints 'null'
Values can be initialized with and assigned bool
s:
ujson::value boolean(true);
assert(bool_cast(boolean) == true);
std::cout << boolean << std::endl; // prints 'true'
boolean = false;
assert(bool_cast(boolean) == false);
std::cout << boolean << std::endl; // prints 'false'
Inside ujson::value
s numbers are represented as 64-bit doubles:
ujson::value number = M_PI;
std::cout << number << std::endl; // prints '3.141592653589793'
The double value can be extracted using a double_cast
:
double d = double_cast(number); // d == M_PI
The double-conversion library is used instead of the platform specific C runtime library to ensure lossless and portable roundtripping of doubles from ASCII to binary.
Beware that only finite numbers are valid in JSON. Infinities and NaNs are not allowed:
number = std::numeric_limits<double>::infinity(); // throws bad_number
Numbers can also represent signed 32-bit integers:
number = 1024;
std::cout << number << std::endl; // prints '1024'
The integer value can be extracted using an int32_cast
:
std::int32_t i = int32_cast(number); // i == 1024
Unsigned 32-bit integers are also supported.
Strings are stored internally as UTF-8:
ujson::value value = "\xC2\xA9 ujson 2014"; // copyright symbol
If the string is not zero-terminated or contains embedded zeros, the length must be passed too:
char title[]= { 0xC2, 0xB5, 'j', 's', 'o', 'n' }; // micro sign + json
value = ujson::value(title, 6);
Strings passed to to µjson must be valid UTF-8:
value = "\xF5"; // invalid utf-8; throws bad_string
If the string is known to be valid UTF-8, the validation step can be skipped by passing no in the last argument of the constructor:
value = ujson::value("valid", 5, ujson::validate_utf8::no);
Strings can also be constructed from std::string
s:
std::string string("ujson");
value = string; // copy into value
Alternatively, if the original string is no longer needed, the
std::string
can be moved into the value and the copy avoided:
value = std::move(string); // move into value
Strings can be accessed using the two string_cast
methods. The first
accepts l-values and returns a ujson::string_view
object:
auto view = string_cast(value);
std::cout << view.c_str() << std::endl; // prints 'ujson'
The returned string view object provides read-only access to the contained string.
The second string cast method accepts r-values and can be used to move a string out of a value:
string = string_cast(std::move(value)); // move string out of value
assert(value.is_null());
Moved from values are always null.
See the "Implementation Details" section for more information on how
µjson handles std::string
s implemented using reference counting
versus short string optimization.
Arrays are represented using ujson::array
, which is simply a typedef for
std::vector<ujson::value>
:
auto array = ujson::array{ true, M_PI, "a string" };
ujson::value value(array);
Copying the array can be avoided by moving it into the value:
value = std::move(array);
Read-only access to the contained array is possible using array_cast
:
ujson::array const &ref = array_cast(value);
The original array can be recovered by moving the array out of the value:
array = array_cast(std::move(value));
As shown in the tutorial it is also possible to use a std::vector<T>
of types T
implicitly convertable to ujson::value
or a vector of
types that supply a to_json
function.
ujson::value
s are designed to be cheap to copy. Internally, strings,
arrays, and objects, are stored using std::shared_ptr<>
s, so copying
only requires incrementing a reference count. However, this sharing
has implications for when it is possible to move:
ujson::value value1 = std::move(array);
ujson::value value2 = value1; // value2 shares immutable array with value1
auto tmp1 = array_cast(std::move(value1)); // note: copy!
auto tmp2 = array_cast(std::move(value2)); // move
In short, moves are only possible if the value has exclusive ownership of the resource. Recall that moved from values are null, so therefore the last move will succeed.
Objects are represented using ujson::object
, which is simply a
typedef for std::vector<std::pair<std::string, ujson::value>>
:
auto object =
ujson::object{ { "a null", ujson::null },
{ "a bool", true },
{ "a number", M_LN2 },
{ "a string", "Hello, world!" },
{ "an array", ujson::array{ 1, 2, 3 } } };
ujson::value value(object);
As usual, copies can be avoided by moving:
value = std::move(object);
Read-only access to the contained object is possible using object_cast
:
ujson::object const &ref = object_cast(value);
The original object can be recovered by moving the object out of the value:
object = object_cast(std::move(value));
For performance reasons objects are implemented using a simple
std::vector
rather than a std::map
. However, objects can still be
constructed using a std::map<std::string,T>
of types T
implicitly
convertable to ujson::value
or a map of types that supply a
to_json
function.
When an ujson::object
is copied or moved into an ujson::value
the vector
is sorted, so that lookups can be performed using a binary search:
auto it = find(object, "a number");
assert(it->second == M_LN2);
In addition to ujson::find
, there is also a ujson::at
function
which behaves like std::map::at
.
Beware that names in objects must also be valid UTF-8:
object.push_back({ "invalid utf-8: \xFF", ujson::null });
value = object; // throws bad_string
Call ujson::parse
to parse a buffer with UTF-8 encoded JSON:
auto value = ujson::parse("[ 1.0, 2.0, 3.0 ]");
If the buffer is not zero-terminated, which is the case with e.g. memory mapped files, the length must also be supplied:
const char *mapped_buffer = ..;
std::size_t mapped_length = ..;
auto value = ujson::parse(mapped_buffer, mapped_length);
Exceptions are thrown on syntax errors:
try {
auto value = ujson::parse("[ 1.0, 2.0, 3.0 "); // invalid syntax
...
} catch (std::exception const &e) {
std::cout << e.what() << std::endl; // prints 'Invalid syntax on line 1.'
}
Apart from syntax errors, the parser will also throw if a number is too large to fit in a double, if a string contains invalid UTF-8, and if the buffer contains trailing junk.
ujson::value
s can be converted to JSON using ujson::to_string
:
auto array = ujson::array{ true, 1.0, "Sk\xC3\xA5l! \xF0\x9F\x8D\xBB" };
auto object =
ujson::object{ { "a null", ujson::null },
{ "a bool", false },
{ "a number", 1.61803398875 },
{ "a string", "R\xC3\xB8""dgr\xC3\xB8""d med fl\xC3\xB8""de." },
{ "an array", array } };
std::cout << to_string(object) << std::endl;
This produces:
{
"a bool" : false,
"a null" : null,
"a number" : 1.61803398875,
"a string" : "Rødgrød med fløde.",
"an array" : [
true,
1,
"Skål! 🍻"
]
}
By default µjson indents by four spaces. It's possible change this and also control whether UTF-8 is allowed in the output:
ujson::to_string_options compact_ascii;
compact_ascii.indent_amount = 0;
compact_ascii.encoding = ujson::character_encoding::ascii;
std::cout << to_string(object, compact_ascii) << std::endl;
With ASCII output, all non-ASCII characters are escaped and with zero indentation all insignificant white space is elided:
{"a bool":true,"a null":null,"a number":1.61803398875,"a string":"R\u00F8dgr\u00F8d med fl\u00F8de.","an array":[true,1,"Sk\u00E5l! \uD83C\uDF7B"]}
ujson::value
is implemented using small object optimiziation. This
avoids the need for expensive heap allocations for simple types, since
the value instead is stored directly inside the object.
type | heap allocation |
---|---|
null | no |
boolean | no |
number | no |
string | depends |
array | yes |
object | yes |
Arrays and objects do require heap allocations, since they are stored
internally using a std::shared_ptr
(usually just a single allocation
is required, since most STL implementations allocate the object and
control block together). While this does make construction more
expensive, it has the advantage that copying values containing arrays
or objects is cheap, since it only amounts to incrementing a reference
count.
Also, since the reference count used by std::shared_ptr
is
thread-safe and the pointed to value immutable, passing a
ujson::value
by value to another thread is free from race
conditions:
auto value = ujson::parse(...);
auto future = std::async(std::launch::async, [value] {
/* do significant work */ });
Strings in the Standard Template Library are implemented using either
short string optimization (SSO) or reference counting. Clang's libc++
and Visual Studio uses the former approach while GCC's libstdc++ uses
the latter. Briefly, in an implementation using reference counting, a
std::string
stores a pointer to the string data and a reference
count. Copy on write (COW) is used to ensure that a string gets it own
unique copy of the string data if modified. In an implementation using
SSO, the string object stores a pointer and a small buffer. Short
strings are stored in the buffer, thus avoiding the heap allocation,
whereas longer strings are stored on the heap. The size of the buffer
for short strings is implementation defined. See the 'sso buffer
size' column in the following table.
platform | arch | ujson::value | std::string | sso buffer size |
---|---|---|---|---|
clang 3.4 (Xcode 5.1.1) | 32-bit | 16 bytes | 12 bytes | 10 bytes |
clang 3.4 (Xcode 5.1.1) | 64-bit | 32 bytes | 24 bytes | 22 bytes |
gcc 4.8.3 (via brew) | 32-bit | 12 bytes | 4 bytes | N/A |
gcc 4.8.3 (via brew) | 64-bit | 24 bytes | 8 bytes | N/A |
vs2013 update 3 | 32 bit | 24 bytes | 28 bytes | 15 bytes |
vs2013 update 3 | 64 bit | 32 bytes | 40 bytes | 15 bytes |
vs2013 ctp1 | 32-bit | 24 bytes | 28 bytes | 15 bytes |
vs2013 ctp1 | 64-bit | 32 bytes | 32 bytes | 15 bytes |
With a COW std::string
µjson simply stores the string object inside
inside the ujson::value
without doing any allocations. Copying is
still inexpensive since copying COW strings is cheap.
With a SSO std::string
short strings are stored directly in the
ujson::value
object and therefore do not require any heap
allocations. Long strings are stored using a std::shared_ptr
, so
they require a single allocation. Like arrays and objects, copying
long strings is therefore cheap.
In summary, copy constructing and copy assigning ujson::value
s is
always an inexpensive operation, requiring at most bumping a
reference count or copying a small buffer, but never any heap
allocations.