• Stars
    star
    207
  • Rank 189,769 (Top 4 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast binary logger for C++

Highlights

  • Logs messages in a compact binary format
  • Fast
    • Hundreds of millions of logs per second
    • Average latency of 1-3 ns for basic data types
    • See benchmarks
  • Provides an unpacker to deflate the log messages
  • Uses fmtlib to format the logs
  • Synchronous logging - not thread safe
  • Header-only library
    • Single header file version available here
  • Requires C++20
  • MIT License

Usage and Performance

The following code logs 1 billion integers to file.

#include <binary_log/binary_log.hpp>

int main()
{
  binary_log::binary_log log("log.out");

  for (int i = 0; i < 1E9; ++i)
    BINARY_LOG(log, "Hello logger, msg number: {}", i);
}

On a modern workstation desktop, the above code executes in ~2s.

Type Value
Time Taken 1.935 s
Throughput 2.06 Gb/s
Performance 516 million logs/s
Average Latency 1.73 ns
File Size ~4 GB
foo@bar:~/dev/binary_log$ time ./build/examples/billion_integers/billion_integers

real    0m1.935s
user    0m0.906s
sys     0m1.000s

foo@bar:~/dev/binary_log$ ls -lart log.out*
-rw-r--r-- 1 pranav pranav         10 Sep 20 11:46 log.out.runlength
-rw-r--r-- 1 pranav pranav         33 Sep 20 11:46 log.out.index
-rw-r--r-- 1 pranav pranav 4000000002 Sep 20 11:46 log.out

Deflate the logs

These binary log files can be deflated using the provided unpacker app:

foo@bar:~/dev/binary_log$ time ./build/tools/unpacker/unpacker log.out > log.deflated

real    2m19.853s
user    1m16.078s
sys     0m50.969s

foo@bar:~/dev/binary_log$ ls -lart log.deflated
-rw-r--r-- 1 pranav pranav 35888888890 Dec  6 08:09 log.deflated

foo@bar:~/dev/binary_log$ wc -l log.deflated
1000000000 log.deflated

foo@bar:~/dev/binary_log$ $ head log.deflated
Hello logger, msg number: 0
Hello logger, msg number: 1
Hello logger, msg number: 2
Hello logger, msg number: 3
Hello logger, msg number: 4
Hello logger, msg number: 5
Hello logger, msg number: 6
Hello logger, msg number: 7
Hello logger, msg number: 8
Hello logger, msg number: 9

foo@bar:~/dev/binary_log$ tail log.deflated
Hello logger, msg number: 999999990
Hello logger, msg number: 999999991
Hello logger, msg number: 999999992
Hello logger, msg number: 999999993
Hello logger, msg number: 999999994
Hello logger, msg number: 999999995
Hello logger, msg number: 999999996
Hello logger, msg number: 999999997
Hello logger, msg number: 999999998
Hello logger, msg number: 999999999
Type Value
Time Taken 2m 19s
Throughput 258 MB/s
Original File Size ~5 GB
Deflated File Size ~35 GB
Log Compression 7x

See benchmarks section for more performance metrics.

Design Goals & Decisions

  • Implement a single-threaded, synchronous logger - Do not provide thread safety
    • If the user wants multi-threaded behavior, the user can choose and implement their own queueing solution
    • There are numerous well-known lock-free queues available for this purpose (moody::concurrentqueue, atomic_queue etc.) - let the user choose the technology they want to use.
    • The latency of enqueuing into a lock-free queue is large enough to matter
      • Users who do not care about multi-threaded scenarios should not suffer the cost
      • Looking at the atomic_queue benchmarks, the average round-trip latency across many state-of-the-art multi-producer, multi-consumer queues, to send and receive a 4-byte integer (between 2 threads, using 2 queues) is around 150-250 ns.
  • Avoid writing static information more than once
    • Examples of static information: the format string, the number of format args, and type of each format arg
    • Store the static information in an "index" file
    • Store the dynamic information in the log file (refer to the index file where possible)
  • Do as little work as possible in the runtime hot path
    • No formatting of any kind
    • All formatting will happen offline using an unpacker that deflates the binary logs

How it Works

binary_log splits the logging into three files:

  1. Index file contains all the static information from the logs, e.g., format string, number of args, type of each arg etc.
    • If a format argument is marked as constant using binary_log::constant, the value of the arg is also stored in the index file
  2. Log file contains two pieces of information per log call:
    1. An index into the index table (in the index file) to know which format string was used
      • If runlength encoding is working, this index might not be written, instead the final runlength will be written to the runlengths file
    2. The value of each argument
  3. Runlength file contains runlengths - If a log call is made 5 times, this information is stored here (instead of storing the index 5 times in the log file)
    • NOTE: Runlengths are only stored if the runlength > 1 (to avoid the inflation case with RLE)

Constants

One can specify a log format argument as a constant by wrapping the value with binary_log::constant(...). When this is detected, the value is stored in the index file instead of the log file as it is now considered "static information" and does not change between calls.

for (auto i = 0; i < 1E9; ++i) {
  BINARY_LOG(log, "Joystick {}: x_min={}, x_max={}, y_min={}, y_max={}",
             binary_log::constant("Nintendo Joycon"),
             binary_log::constant(-0.6),
             binary_log::constant(+0.65),
             binary_log::constant(-0.54),
             binary_log::constant(+0.71));
}

The above loop runs in under 500 ms. The final output is compact at just 118 bytes and contains all the information needed to deflate the log (if needed).

File Size
log.out 1 byte
log.out.runlength 6 bytes
log.out.index 111 bytes
foo@bar:~/dev/binary_log$ ls -lart log.out*
-rw-r--r-- 1 pranav pranav   6 Dec  5 08:41 log.out.runlength
-rw-r--r-- 1 pranav pranav 111 Dec  5 08:41 log.out.index
-rw-r--r-- 1 pranav pranav   1 Dec  5 08:41 log.out

foo@bar:~/dev/binary_log$ hexdump -C log.out.index
00000000  33 4a 6f 79 73 74 69 63  6b 20 7b 7d 3a 20 78 5f  |3Joystick {}: x_|
00000010  6d 69 6e 3d 7b 7d 2c 20  78 5f 6d 61 78 3d 7b 7d  |min={}, x_max={}|
00000020  2c 20 79 5f 6d 69 6e 3d  7b 7d 2c 20 79 5f 6d 61  |, y_min={}, y_ma|
00000030  78 3d 7b 7d 05 0c 0b 0b  0b 0b 01 0f 4e 69 6e 74  |x={}........Nint|
00000040  65 6e 64 6f 20 4a 6f 79  63 6f 6e 01 33 33 33 33  |endo Joycon.3333|
00000050  33 33 e3 bf 01 cd cc cc  cc cc cc e4 3f 01 48 e1  |33..........?.H.|
00000060  7a 14 ae 47 e1 bf 01 b8  1e 85 eb 51 b8 e6 3f     |z..G.......Q..?|
0000006f

Benchmarks

System Details

Type Value
Processor 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Installed RAM 32.0 GB (31.9 GB usable)
SSD ADATA SX8200PNP
OS Ubuntu 20.04 LTS running on WSL in Windows 11
C++ Compiler g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
foo@bar:~/dev/binary_log$ ./build/benchmark/binary_log_benchmark
2022-09-20T12:59:39-05:00
Running ./build/benchmark/binary_log_benchmark
Run on (16 X 3504 MHz CPU s)
Load Average: 0.52, 0.58, 0.59
------------------------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
BM_binary_log_static_integer<uint8_t>/42                      1.22 ns         1.20 ns    560000000 Latency=1.19978ns Logs/s=833.488M/s
BM_binary_log_static_integer<uint16_t>/395                    1.43 ns         1.43 ns    448000000 Latency=1.42997ns Logs/s=699.317M/s
BM_binary_log_static_integer<uint32_t>/3123456789             1.89 ns         1.84 ns    373333333 Latency=1.84152ns Logs/s=543.03M/s
BM_binary_log_static_integer<uint64_t>/9876543123456789       5.45 ns         2.76 ns    248888889 Latency=2.76228ns Logs/s=362.02M/s
BM_binary_log_static_integer<int8_t>/-42                      1.25 ns         1.26 ns    560000000 Latency=1.25558ns Logs/s=796.444M/s
BM_binary_log_static_integer<int16_t>/-395                    1.54 ns         1.57 ns    448000000 Latency=1.56948ns Logs/s=637.156M/s
BM_binary_log_static_integer<int32_t>/-123456789              1.94 ns         1.97 ns    373333333 Latency=1.96708ns Logs/s=508.369M/s
BM_binary_log_static_integer<int64_t>/-9876543123456789       4.11 ns         2.92 ns    235789474 Latency=2.91574ns Logs/s=342.967M/s
BM_binary_log_static_float                                    1.82 ns         1.84 ns    407272727 Latency=1.84152ns Logs/s=543.03M/s
BM_binary_log_static_double                                   3.29 ns         2.73 ns    263529412 Latency=2.7274ns Logs/s=366.65M/s
BM_binary_log_static_string                                   4.93 ns         2.92 ns    235789474 Latency=2.91574ns Logs/s=342.967M/s
BM_binary_log_random_integer<uint8_t>                         5.75 ns         5.72 ns    112000000 Latency=5.71987ns Logs/s=174.829M/s
BM_binary_log_random_integer<uint16_t>                        6.08 ns         6.14 ns    112000000 Latency=6.13839ns Logs/s=162.909M/s
BM_binary_log_random_integer<uint32_t>                        7.51 ns         7.67 ns     89600000 Latency=7.67299ns Logs/s=130.327M/s
BM_binary_log_random_integer<uint64_t>                        15.0 ns         15.0 ns     44800000 Latency=14.9972ns Logs/s=66.6791M/s
BM_binary_log_random_integer<int8_t>                          5.70 ns         5.72 ns    112000000 Latency=5.71987ns Logs/s=174.829M/s
BM_binary_log_random_integer<int16_t>                         5.84 ns         5.86 ns    112000000 Latency=5.85938ns Logs/s=170.667M/s
BM_binary_log_random_integer<int32_t>                         7.89 ns         7.67 ns     89600000 Latency=7.67299ns Logs/s=130.327M/s
BM_binary_log_random_integer<int64_t>                         14.9 ns         15.0 ns     44800000 Latency=14.9972ns Logs/s=66.6791M/s
BM_binary_log_random_real<float>                              6.29 ns         6.25 ns    100000000 Latency=6.25ns Logs/s=160M/s
BM_binary_log_random_real<double>                             11.6 ns         11.7 ns     64000000 Latency=11.7188ns Logs/s=85.3333M/s
BM_binary_log_billion_integers                          2320246800 ns   1765625000 ns            1 Latency=1.76562ns Logs/s=566.372M/s

Supported Format Argument Types

binary_log supports a limited number of types of format arguments. They are:

bool,
char, 
uint8_t, uint16_t, uint32_t, uint64_t
int8_t, int16_t, int32_t, int64_t,
float, double,
const char*,
std::string,
std::string_view

Building and installing

See the BUILDING document.

Generating Single Header

python3 utils/amalgamate/amalgamate.py -c single_include.json -s .

Contributing

See the CONTRIBUTING document.

License

The project is available under the MIT license.

More Repositories

1

awesome-hpp

A curated list of awesome header-only C++ libraries
3,468
star
2

indicators

Activity Indicators for Modern C++
C++
3,004
star
3

argparse

Argument Parser for Modern C++
C++
2,655
star
4

tabulate

Table Maker for Modern C++
C++
1,926
star
5

pprint

Pretty Printer for Modern C++
C++
911
star
6

csv2

Fast CSV parser and writer for Modern C++
C++
552
star
7

alpaca

Serialization library written in C++17 - Pack C++ structs into a compact byte-array without any macros or boilerplate code
C++
474
star
8

structopt

Parse command line arguments by defining a struct
C++
455
star
9

fccf

fccf: A command-line tool that quickly searches through C/C++ source code in a directory based on a search string and prints relevant code snippets that match the query.
C++
359
star
10

glob

Glob for C++17
C++
246
star
11

csv

[DEPRECATED] See https://github.com/p-ranav/csv2
C++
234
star
12

criterion

Microbenchmarking for Modern C++
C++
211
star
13

hypergrep

Recursively search directories for a regex pattern
C++
201
star
14

saveddit

Bulk Downloader for Reddit
Python
169
star
15

PhotoLab

AI-Powered Photo Editor (Python, PyQt6, PyTorch)
Python
161
star
16

box

box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.
Python
120
star
17

cppgit2

Git for Modern C++ (A libgit2 Wrapper Library)
C++
116
star
18

psched

Priority-based Task Scheduling for Modern C++
C++
84
star
19

repr

repr for Modern C++: Return printable string representation of a value
C++
83
star
20

fswatch

File/Directory Watcher for Modern C++
C++
79
star
21

envy

envy: Deserialize environment variables into type-safe structs
C++
66
star
22

pipeline

Pipelines for Modern C++
C++
57
star
23

iris

Lightweight Component Model and Messaging Framework based on ØMQ
C++
53
star
24

merged_depth

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models
Python
47
star
25

unicode_display_width

Displayed width of UTF-8 strings in Modern C++
C++
44
star
26

task_system

Task System presented in "Better Code: Concurrency - Sean Parent"
C++
39
star
27

cgol

Conway's Game of Life in the Terminal
C++
35
star
28

small_vector

"Small Vector" optimization for Modern C++: store up to a small number of items on the stack
C++
33
star
29

jsonlint

Lightweight command-line tool for validating JSON
C++
33
star
30

result

Result<T, E> for Modern C++
C++
32
star
31

container_traits

Container Traits for Modern C++
C++
28
star
32

lexer

Hackable Lexer with UTF-8 support
C++
21
star
33

lc

Fast multi-threaded line counter in Modern C++ (2-10x faster than `wc -l` for large files)
C++
18
star
34

oystr

oystr recursively searches directories for a substring.
C++
10
star
35

walnut.v1

The Walnut programming language
C++
8
star
36

line-detector

OpenCV-based Hough Transform Line Detection
C++
8
star
37

ttt

Terminal Typing Test
C++
7
star
38

OpenGL-Engine

OpenGL 3D Rendering Engine
C++
7
star
39

wxPython-text-editor

wxPython Text Editor
Python
6
star
40

Vulkan-Earth

Vulkan-based 3D Rendering of Earth
HTML
6
star
41

strcpp.old

String Manipulation API for C++
C++
6
star
42

DiverseDepth

The code and data of DiverseDepth
Python
6
star
43

ImageViewer-Qt6

Minimalist image viewer in Qt6
C++
6
star
44

any_of_trait

Type traits for any_of and any_but
C++
5
star
45

zcm

A Lightweight Component Model using ZeroMQ
C++
4
star
46

StaticAnalysis

GitHub action for C++ static analysis
Python
4
star
47

video_device_discovery

Find all video devices connected to Linux-based embedded platform
C++
3
star
48

krpci

C++ client to kRPC for communication with Kerbal Space Program (KSP)
C++
2
star
49

activity-plotter

Linux Scheduler Thread Activity Plotter
Python
2
star
50

python-zcm

ZeroMQ-based Component Model in Python
Python
2
star
51

emacs_config

Emacs configuration
Emacs Lisp
1
star
52

plexil-analysis

Timing Analysis for the Plan Interchange Language (Plexil)
Python
1
star
53

object-tracker

OpenCV-based Real-time Object Tracking
C++
1
star
54

json.old

JSON Manipulation Library for C++
C++
1
star
55

phd-dissertation

TeX
1
star
56

OpenGL-Engine-II

OpenGL 3D Rendering Engine II - Alternate Architecture
C++
1
star
57

arangit

Python program that can scan a .git folder and reconstruct a git version control property graph in ArangoDB
Python
1
star
58

ros-installer

Script to install ROS Indigo from source
Python
1
star