• Stars
    star
    593
  • Rank 75,443 (Top 2 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created almost 5 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A high performance C++ log library, producing structured binary logs

Binlog

Lifecycle Active

A high performance C++ log library to produce structured binary logs.

BINLOG_INFO("Log anything! {}, {} or even {}", 1.2f, std::vector{3,4,5}, AdaptedStruct{1, "Foo"});

Motivation

Consider the following log excerpt, taken from an application, that computes a float value for each request using multiple threads:

INFO [2019/12/10 12:54:47.062805012] thread1 Process request #1, result: 6.765831 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062805881] thread1 Process request #2, result: 7.835833 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062806406] thread2 Process request #3, result: 2.800832 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062806903] thread3 Process request #5, result: 5.765831 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062807397] thread2 Process request #4, result: 1.784832 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062807877] thread2 Process request #7, result: 2.777844 (bin/app/processor.cpp:137)
INFO [2019/12/10 12:54:47.062808437] thread3 Process request #6, result: 3.783869 (bin/app/processor.cpp:137)

There's a lot of redundancy (the severity, the format string, the file path are printed several times), wasted space (the timestamp is represented using 29 bytes, where 8 bytes of information would be plenty), and loss of precision (the originally computed floating point result is lost, only its text representation remains). Furthermore, the log was expensive to produce (consider the string conversion of each timestamp and float value) and not trivial to parse using automated tools. Conventional log solutions also have to make a trade-off: either implement synchronous logging (possibly via slow locking) that ensures the log events are sorted by creation time, or asynchronous logging (without global locks) that usually produces unsorted output and typically prone to data loss if the application crashes.

Binlog solves these issues by using structured binary logs. The static parts (severity, format string, file and line, etc) are saved only once to the logfile. While logging, instead of converting the log arguments to text, they are simply copied as-is, and the static parts are referenced by a single identifier. This way the logfile becomes less redundant (static parts are only written once), smaller (e.g: timestamps are stored on fewer bytes without loss of precision), and possibly more precise (e.g: the exact float is saved, without having the representation interfering with the value). Because of to the smaller representation, binary logfiles are much faster to produce, saving precious cycles on the hot path of the application. Thanks to the structured nature, binary logfiles can be consumed by other programs, without reverting to fragile text processing methods.

Binary logfiles are not human readable, but they can be converted to text using the bread program. The format of the text log is configurable, independent of the logfile. bread can also efficiently sort the logs by time, taking advantage of its structured nature.

Asynchronous logging saves cycles on the hot path, but allows log events to remain in the internal buffers if the application crashes. Binlog solves this by providing a recovery tool, brecovery, that recovers stuck data from coredumps in a platform agnostic way.

For further information, please refer to the Documentation.

Features

  • Log (almost) anything:
    • Fundamentals (char, bool, integers, floats)
    • Containers (vector, deque, list, map, set and more)
    • Strings (const char pointer, string, string_view)
    • Pointers (raw, unique_ptr, shared_ptr and optionals)
    • Pairs and Tuples
    • Enums
    • Custom structures (when adapted)
  • High performance: log in a matter of tens of nanoseconds
  • Asynchronous logging via lockfree queues
  • Format strings with {} placeholders
  • Vertical separation of logs via custom categories
  • Horizontal separation of logs via runtime configurable severities
  • Sort logs by time while reading
  • Recover buffered log events from coredumps

Performance

The performance of creating log events is benchmarked using the Google Benchmark library. Benchmark code is located at test/perf. The benchmark measures the time it takes to create a timestamped log event with various log arguments (e.g: one integer, one string, three floats) in a single-producer, single-consumer queue (the logging is asynchronous), running in a tight loop. Different clocks are used to timestamp log events. no clock means constant 0, TSC clock uses the time stamp counter, while sys clock means std::chrono::system_clock::now() or clock_gettime. There's a bechmark poison cache, trying to guess the cache effect, that invalidates the data cache on each iteration by writing a large byte buffer. The timing includes the writing of the buffer.

The results reported below are a sample of several runs, only to show approximate performance, and not for direct comparison. Machine configuration: Intel Xeon E5-2698 2.20 GHz, RHEL 2.6.32.

Benchmark Std. Dev. Median
One integer (no clock) 0 ns 8 ns
One integer (TSC clock) 0 ns 9 ns
One integer (sys clock) 0 ns 34 ns
One integer (poison cache) 8 ns 740 ns
One string 0 ns 38 ns
Three floats 0 ns 30 ns

Install

See INSTALL.md. In general, Binlog requires a C++14 conforming platform. Binlog is built and tested on Linux, macOS and Windows, with various recent versions of GCC, Clang and MSVC.

Releases

Binlog is developed in a "live at head" fashion. Every commit that is merged into the main branch goes through the same pipeline that builds and tests the whole library on every supported platform. If it is in main, it is considered stable. Time to time, a GitHub release is created, for the convenience of those who are uncomfortable with using (or unable to use) untagged commits.

More Repositories

1

hobbes

A language and an embedded JIT compiler
C
1,162
star
2

modern-cpp-kafka

A C++ API for Kafka clients (i.e. KafkaProducer, KafkaConsumer, AdminClient)
C++
335
star
3

testplan

Testplan, a multi-testing framework, because unit tests can only go so far..
Python
179
star
4

Xpedite

A non-sampling profiler purpose built to measure and optimize performance of C++ low latency/real time systems
Python
150
star
5

treadmill

lightweight, container-based compute fabric
Python
141
star
6

optimus-cirrus

Scala
94
star
7

desktopJS

desktopJS provides a common API across multiple HTML5 containers. By programming to a common API, applications can target multiple HTML5 containers without change.
TypeScript
76
star
8

ComposeUI

A .NET Core based, WebView2 using UI Container for hybrid web-desktop applications
C#
60
star
9

pyfixmsg

pyfixmsg is a library for parsing, manipulating and serialising FIX messages, primarily geared towards testing.
Python
54
star
10

winss

A supervision suite for Windows
C++
53
star
11

SilverKing

Scalable, high-throughput storage and coordination
Java
43
star
12

dotnet-please

CLI tool aims to streamline some repetitive tasks around Visual Studio projects and solutions
C#
24
star
13

MSML

Jupyter Notebook
23
star
14

ts-mocking-bird

A fully type safe mocking, call verification and import replacement library for jasmine and jest
TypeScript
19
star
15

flats

C++
18
star
16

needle

A lightweight flexible Dependency Injection library for use with TypeScript projects.
TypeScript
16
star
17

Crossroads

.NET Core commandline tool packager for developers
C#
11
star
18

message-broker

The MessageBroker is a typescript library for providing asynchronous communication throughout your app
TypeScript
11
star
19

cpx-training

Super simple sample programs for Circuit Playground Express
JavaScript
10
star
20

followapp-core

Analytics backed, cloud-based technology solution to remind and follow up with underprivileged mothers living in slums for their children's critical immunization
Java
9
star
21

venn-core

Perl
8
star
22

Plug-in-AppDynamics-Data-Source

AppDynamics Grafana Data Source Plugin
TypeScript
7
star
23

treadmill-aws

Treadmill AWS specific module
Python
7
star
24

treadmill-pid1

supporting project for treadmill
C
6
star
25

Saturn

Quality Assurance Automation Toolkit
Java
6
star
26

ML-for-Good-Hackathon

5
star
27

treadmill-bind

supporting project for treadmill
C
5
star
28

gradle-plugin-application

Morgan Stanley's Application plugin for Gradle allows you to package your Java-based applications for distribution, much like Gradle's built-in Application plugin does, but in a more standard and more flexible way.
Java
4
star
29

treadmill-tktfwd

Supporting project for treadmill
C++
3
star
30

zookeeper-authorizers

Java
3
star
31

releaseherald

releaseherald is a utility that can generate release notes. The only help it needs from developers is to collect meaningful news fragments in a directory, and apply consistent tags.
Python
3
star
32

jenkins-private-node

Jenkins plugin to support private CI nodes using job name patterns
Java
2
star
33

journal

Journal is a distributed state store intended to use with Zookeeper and NFS. Journal can be used with any asynchronous API Server to keep track of long running requests or automation tools.
Python
1
star
34

followapp-utils

Java
1
star
35

battleships-player-python

Python
1
star
36

proton

Hackathon game server
Scala
1
star
37

battleships-player-js

JavaScript
1
star
38

treadmill-runtime

C++
1
star
39

eslint-plugin-externalincludes

Suite of @html-eslint ESLint rules for working with external references such as script src and link hrefs
JavaScript
1
star
40

battleships

TypeScript
1
star
41

venn-cli

Perl
1
star
42

laf

Little Admin Framework
Python
1
star