• Stars
    star
    164
  • Rank 230,032 (Top 5 %)
  • Language
    C++
  • License
    BSD 3-Clause "New...
  • Created over 9 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LightweighT Almost Lock-Less Oriented for C++ programs memory allocator

Build status Build Status

ltalloc

  • LightweighT Almost Lock-Less Oriented for C++ programs memory allocator
  • Automatically exported from code.google.com/p/ltalloc

Overview

  • Simple (yet very efficient) multi-threaded memory allocator based on free lists.
  • It is best suited for applications doing a lot of small (<256B) memory allocations (as usually C++ stl containers do), and from many simultaneously running threads.

Features

  • O(1) cost for alloc, free (for blocks of size <56KB)
  • Low fragmentation
  • Near zero size overhead for small allocations (no header per allocation, just one common 64 bytes header for all blocks inside 64KB chunk)
  • High efficiency and scalability for multi-threaded programs (almost lock-free, at maximum one spin-lock per 256 alloc/free calls for small allocations, even if all memory allocated in one thread then freed inside another thread)

Usage

To use ltalloc in your C++ application just add ltalloc.cc source file into your project's source files list. It overrides global operators new and delete, which is a fully C++ standard compliant way to replace almost all memory alocation routines in C++ applications (as stl container's default allocators call global operator new). But if this way is not well suilable for you, the other options of plug-in ltalloc into your application are exists as well. Actually, ltalloc.cc source is written in C (and overriding of operators new/delete is disabled automatically if __cplusplus is not defined), so it can be compiled both as C and C++ code.

Wiki

Introduction

Almost every C++ programmer knows about opportunity to substitute your own custom allocator for the default of stl containers, but almost no one actually use this opportunity. :)

And I agree, that this feature is become obviously almost unusable when dealing with large enough real projects, especially when a lot of third-party C++ libraries used, and you quickly realize that containers with different allocators are just incompatible with each other.

After all, why custom allocators (for containers) are actually may needed for?

I do not believe that control over memory allocation per container can give at least some benefits. I mean, that control over memory allocation should be done not per container, but per {thread, blocksize} pair. Otherwise, memory obtained from a custom allocator is not shared with other objects of the same size (lead to memory wastage), and there are potential multi-threading issues. So, when you think of usefulness of custom allocators there are more questions than answers.

After all, I thought what if specific pool can be chosen at compile-time when the size of requested memory block is known beforehand. Then, a single application-wide allocator can completely eliminate any need for custom allocators! This idea looks like somewhat unrealistic, but still I decided to try implementing it.

Design Principles

  1. Inlining and compile-time size class calculation.

    When code of allocation function is small enough, compiler can inline it to eliminate need of call. And also, when size of object is known beforehand, it would be very good if size class (i.e. specific pool to satisfy that allocation request) can be chosen at compile-time. To make this possible, computation of the size class itself should rely only on built-in operators (no asm or external function calls) and must not access any dynamically calculated data. After all, application's source code should be compiled with link-time optimization turned on (/GL for MSVC, -flto for GCC/Clang, and -ipo for ICC) to make possible the inlining of operator new calls. As a sample output, here is a result of compilation of single statement "new std::array<int, 10>":

    Source Code

NOINLINE void *test_function()
{
    return new std::array<int, 10>;
}

void *operator new(size_t size) { return ltalloc<true>(size); }
void *operator new(size_t size, const std::nothrow_t&) { return ltalloc<false>(size); }

template <bool throw_> static void *ltalloc(size_t size)
{
    unsigned int sizeClass = get_size_class(size); //computed at compile-time
    ThreadCache *tc = &threadCache[sizeClass];
    FreeBlock *fb = tc->freeList;
    if (likely(fb))
    {
        tc->freeList = fb->next;
        tc->counter++;
        return fb;
    }
    else
        return fetch_from_central_cache<throw_>(size, tc, sizeClass);
}

MSVC 2012 compiler 32-bit asm output

mov         eax,dword ptr fs:[0000002Ch]
mov         edx,dword ptr [eax]
add         edx,128h ;296=sizeClass*sizeof(tc[0])
mov         eax,dword ptr [edx]
test        eax,eax
je          L1 ; probability is just about 1%
mov         ecx,dword ptr [eax]
inc         dword ptr [edx+8]
mov         dword ptr [edx],ecx
ret
 L1:
 push        18h ; =24 (size class)
 mov         ecx,28h ; =40 (bytes size)
 call        fetch_from_central_cache<1> (0851380h)
 add         esp,4
 ret

GCC 4.8.1 64-bit asm output

mov    rdx,0xffffffffffffe7a0
mov    rax,QWORD PTR fs:[rdx+0x240]
test   rax,rax
je     L1 ; prob 1%
mov    rcx,QWORD PTR [rax]
add    DWORD PTR fs:[rdx+0x250],0x1
mov    QWORD PTR fs:[rdx+0x240],rcx
ret
 L1:
 add    rdx,QWORD PTR fs:0x0
 mov    edi,0x28 ; =40 (bytes size)
 lea    rsi,[rdx+0x240]
 mov    edx,0x18 ; =24 (size class)
 jmp    <_Z24fetch_from_central_cache...>

As you can see, the "new array" statement takes a just 9 asm instructions (or even 7 for GCC).

Here is another example - function that do many allocations in a loop to create a singly-linked list of arrays:

Source code

NOINLINE void *create_list_of_arrays()
{
    struct node
    {
        node *next;
        std::array<int, 9> arr;
    } *p = NULL;

    for (int i=0; i<1000; i++)
    {
        node *n = new node;
        n->next = p;
        p = n;
    }

    return p;
}
 mov         eax,dword ptr fs:[0000002Ch]
 push        ebx
 push        esi
 mov         esi,dword ptr [eax]
 push        edi
 xor         edi,edi
 add         esi,128h
 mov         ebx,3E8h    ; =1000

VS

 L2:
mov         eax,dword ptr [esi]
test        eax,eax
je          L1 ; prob 1%
mov         ecx,dword ptr [eax]
inc         dword ptr [esi+8]
mov         dword ptr [esi],ecx
dec         ebx                  ; i++
mov         dword ptr [eax],edi  ; n->next = p;
mov         edi,eax              ; p = n;
jne         L2                   ; if (i<1000) goto L2
 pop         edi
 pop         esi
 pop         ebx
 ret
 L1:
 ...

GCC

 ...
 L2:
mov    r12,rax                      ; p = n;
mov    rax,QWORD PTR fs:[rbx+0x258]
test   rax,rax
je     L1 ; prob 1%
mov    rdx,QWORD PTR [rax]
add    DWORD PTR fs:[rbx+0x268],0x1
mov    QWORD PTR fs:[rbx+0x258],rdx
L3:
sub    ebp,0x1                      ; i++
mov    QWORD PTR [rax],r12          ; n->next = p;
jne    L2                           ; if (i<1000) goto L2
 add    rsp,0x8
 pop    rbx
 pop    rbp
 pop    r12
 pop    r13
 ret
 L1:
 mov    edx,0x19
 mov    rsi,r13
 mov    edi,0x30
 call   <_Z24fetch_from_central_cache...>
 jmp    L3

For this case, compiler has optimized a whole "new node;" statement inside the loop to a mere 6 asm instructions!

I think, that execution speed of this resulting asm-code (generated for general enough C++ code) can quite compete with a good custom pool-based allocator implementation.

(Although, inlining can give some performance improvement, it is not extremely necessary, and even a regular call of ltalloc function still will be working very fast.)

  1. Thread-efficiency and scalability.

    To achieve high multithreading efficiency ltalloc uses an approach based on TCMalloc (I didn't take any code from TCMalloc, but rather just a main idea). So, there is per-thread cache (based on native thread_local variables). And all allocations (except the large ones, >56KB) are satisfied from the thread-local cache (just simple singly linked list of free blocks per size class).

    If the free list of the thread cache is empty, then batch (256 or less) of memory blocks is fetched from a central free list (list of batches, shared by all threads) for this size class, placed in the thread-local free list, and one of blocks of this batch returned to the application. When an object is deallocated, it is inserted into the appropriate free list in the current thread's thread cache. If the thread cache free list now reaches a certain number of blocks (256 or less, depending on the block size), then a whole free list of blocks moved back to the central list as a single batch.

    This simple batching approach alone gives enough scalability (i.e. with applicable low contention) for theoretically up to 128-core SMP system if memory allocation operations will be interleaved with at least 100 CPU cycles of another work (this is a rough average of single operation of moving batch to the central cache or fetch it from). And this approach especially effective for a producer-consumer pattern, when memory allocated in one thread then released on another.

  2. Compact layout.

    While most memory allocators store at least one pointer at the beginning (header) of each memory block allocated (so, for example, each 16 bytes (or even 13) block request actually wastes 32 bytes, because of 16B-alignment requirement), ltalloc rather just keeps a small header (64 bytes) per chunk (64KB by default), while all allocated blocks are just stored contiguously inside chunk without any metadata interleaved, which is much more efficient for small memory allocations.

    So, if there is no any pointer at beginning of each block, there should be another way to find metadata for allocated objects. Some allocators to solve this problem keeps sbrk pointer, but this has such drawbacks as necessity to emulate sbrk on systems that don't support it, and that memory allocated up to sbrk limit can not be effectively returned to the system. So I decided to use another approach: all big blocks (obtained directly from the system) are always aligned to multiples of the chunk size, thus all blocks within any chunk will be not aligned as opposed to sysblocks, and this check can be done with simple if (uintptr_t(p)&(CHUNK_SIZE-1)), and pointer to chunk header is calculated as (uintptr_t)p & ~(CHUNK_SIZE-1). (Similar approach used in jemalloc.)

    Finally, mapping of block size to corresponding size class is done via a simple approach of rounding up to the nearest "subpower" of two (i.e. 2n, 1.252n, 1.52n, and 1.75*2n by default, but this can be configured, and it can be reduced to exact power of two sizes), so there are 51 size classes (for all small blocks <56KB), and size overhead (internal fragmentation) is no more than 25%, in average 12%.

    As a free bonus, this approach combined with contiguously blocks placement gives a "perfect alignment feature" for all memory pointers returned (see below).

FAQ

  1. Is ltalloc faster than all other general purpose memory allocators?

    Yes, of course, why else to start writing own memory allocator. :-)

    But, joking aside, let's look at the performance comparison table below (results obtained with this simple test, which is just continuously allocating and freeing memory blocks of 128 bytes size from simultaneously running threads).

    (Results are given in millions of operations (pairs of alloc+free) per second for a single thread, i.e. to obtain a total amount of operations/sec, you should multiply corresponding result by the number of threads.)

    image

    Here is a chart for 2x Xeon E5620/Debian:

    image

    While this test is completely synthetic (and may be too biased), it measures precisely just an allocation/deallocation cost, excluding influence of all other things, such as cache misses (which are very important, but not always). So even this benchmark can be quite representative for some applications with small working memory set (which entirely fits inside cpu cache), or some specific algorithms.

  2. What makes ltalloc so fast?

    Briefly, that its ultimately minimalistic design and extremely polished implementation, especially minimization of conditional branches per a regular alloc call. Consider a typical implementation of memory allocation function:

if (size == 0) return NULL (or size = 1)
if (!initialized) initialize_allocator()
if (size < some_threshold) (to test if size requested is small enough)
if (freeList) {result = freeList, freeList = freeList->next}
if (result == NULL) throw std::bad_alloc() (for an implementation of operator new)

But in case of call to operator new overloaded via ltalloc there will be just one conditional branch (4th in the list above) in 99% cases, while all other checks are doing only when necessary in the remaining 1% rare cases.

  1. Does ltalloc return memory to the system automatically?

    Well, it is not (except for large blocks).

    But you can always call ltalloc_squeeze() manually at any time (e.g., in separate thread), which almost have no any impact on performance of allocations/deallocations in the others simultaneously running threads (except the obvious fact of having to re-obtain memory from the system when allocating new memory after that). And this function can release as much memory as possible (not only at the top of the heap, like malloc_trim does).

    I don't want doing this automatically, because it highly depends on application's memory allocation pattern (e.g., imagine some server app that periodically (say, once a minute) should process some complex user request as quickly as possible, and after that it destroys all objects used for processing - returning any memory to the system in this case may significantly degrade performance of allocation of objects on each new request processing). Also I dislike any customizable threshold parameters, because it is usually hard to tune optimally for the end user, and this has some overhead as some additional checks should be done inside alloc/free call (non necessary at each call, but sometimes they should be done). So, instead I've just provided a mechanism to manually release memory at the most appropriate time for a specific application (e.g. when user inactive, or right after closing any subwindow/tab).

    But, if you really want this, you can run a separate thread which will just periodically call ltalloc_squeeze(0). Here is one-liner for C++11:

std::thread([] {for (;;ltsqueeze(0)) std::this_thread::sleep_for(std::chrono::seconds(3));}).detach();
  1. Why there are no any memory statistics provided by the allocator?

    Because it causes additional overhead, and I don't see any reason to include some sort of things into such simple allocator.

    Anyway there will be some preprocessor macro define to turn it on, so you can take any suitable malloc implementation and optionally hook it up in place of ltalloc with preprocessor directives like this:

#ifdef ENABLE_ADDITIONAL_MEMORY_INFO
#include "some_malloc.cxx"
#else
#include "ltalloc.cc"
#endif
  1. Why there is no separate function to allocate aligned memory (like aligned_alloc)?

    Just because it's not needed! :)

    ltalloc implicitly implements a "perfect alignment feature" for all memory pointers returned just because of its design.

    All allocated memory blocks are automatically aligned to appropriate the requested size, i.e. alignment of any allocation is at least pow(2, CountTrailingZeroBits(objectSize)) bytes. E.g., 4 bytes blocks are always 4 bytes aligned, 24 bytes blocks are 8B-aligned, 1024 bytes blocks are 1024B-aligned, 1280 bytes blocks are 256B-aligned, and so on.

    (Remember, that in C/C++ size of struct is always a multiple of its largest basic element, so for example sizeof(struct {__m128 a; char s[4];}) = 32, not 20 (16+4) ! So, for any struct S operator "new S" will always return a suitably aligned pointer.)

    So, if you need a 4KB aligned memory, then just request (desired_size+4095)&~4095 bytes size (description of aligned_alloc function from C11 standard already states that the value of size shall be an integral multiple of alignment, so ltalloc() can be safely called in place of aligned_alloc() even without need of additional argument to specify the alignment).

    But to be completely honest, that "perfect alignment" breaks after size of block exceeds a chunk size, and after that all blocks of greater size are aligned by the size of chunk (which is 64KB by default, so, generally, this shouldn't be an issue).

    Here is a complete table for all allocation sizes and corresponding alignment (for 32-bit platform):

    image

    Blocks of size greater than 57344 bytes are allocated directly from the system (actual consumed physical memory is a multiple of page size (4K), but virtual is a multiple of alignment - 65536 bytes).

  2. Why big allocations are not cached, and always directly requested from the system?

    Actually, I don't think that caching of big allocations can give significant performance improvement for real usage, as simple time measurements show that allocating even 64K of memory directly with VirtualAlloc or mmap is faster (2-15x depending on the system) than simple memset to zero that allocated memory (except the first time, which takes 4-10x more time because of physical page allocation on first access). But, obviously, that for greater allocation sizes, overhead of the system call would be even less noticeable. However, if that really matters for your application, then just increase constant parameter CHUNK_SIZE to a desired value.

Usage: GNU/Linux

  1. gcc /path/to/ltalloc.cc ...
  2. gcc ... /path/to/libltalloc.a
  3. LD_PRELOAD=/path/to/libltalloc.so <appname> [<args...>]

For use options 2 and 3 you should build libltalloc:

hg clone https://code.google.com/p/ltalloc/
cd ltalloc/gnu.make.lib
make

(then libltalloc.a and libltalloc.so files are created in the current directory)

And with this options (2 or 3) all malloc/free routines (calloc, posix_memalign, etc.) are redirected to ltalloc.

Also be aware, that GCC when using options -flto and -O3 with p.2 will not inline calls to malloc/free until you also add options -fno-builtin-malloc and -fno-builtin-free (however, this is rather small performance issue, and is not necessary for correct work).

Usage: Windows

Unfortunately, there is no simple way to override all malloc/free crt function calls under Windows, so far there is only one simple option to override almost all memory allocations in C++ programs via global operator new override - just add ltalloc.cc file into your project and you are done.

ltalloc was successfully compiled with MSVC 2008/2010/2012, GCC 4., Intel Compiler 13, Clang 3., but it's source code is very simple, so it can be trivially ported to any other C or C++ compiler with native thread local variables support. (Warning: in some builds of MinGW there is a problem with emutls and order of execution of thread destructor (all thread local variables destructed before it), and termination of any thread will lead to application crash.)

Changelog

  • v2.0.0 (2015/06/16)
    • ltcalloc(), ltmsize(), ltrealloc(), ltmemalign(), LTALLOC_AUTO_GC_INTERVAL
  • v1.0.0 (2015/06/16)
  • v0.0.0 (2013/xx/xx)
    • Fork from public repository

More Repositories

1

bundle

📦 Bundle, an embeddable compression library: DEFLATE, LZMA, LZIP, BZIP2, ZPAQ, LZ4, ZSTD, BROTLI, BSC, CSC, BCM, MCM, ZMOLLY, ZLING, TANGELO, SHRINKER, CRUSH, LZJB and SHOCO streams in a ZIP file (C++03)(C++11)
C++
623
star
2

scriptorium

📜 Game Scripting Languages benchmarked
C
486
star
3

statvs

Hopefully updated status of all my github repositories
344
star
4

AVA

A tiny unlicensed 3D game engine in C; with C++ and Lua interfaces. Written in 32 random ̷d̷a̷y̷s̷ m̷o̷n̷t̷h̷s̷ years.
C
336
star
5

getopt

Simple command-line options handler (C++11)
C++
327
star
6

sole

🍩 Sole is a lightweight C++11 library to generate universally unique identificators (UUID), both v1 and v4.
C++
296
star
7

fsm

📑 Simple and lightweight Hierarchical/Finite-State Machine (H-FSM) class (C++11)
C++
229
star
8

tracey

:squirrel: Tracey is a lightweight and simple C++ memory leak finder with no dependencies.
C++
226
star
9

spot

🌀 Compact and embeddable RGBA/HSLA library that supports WEBP, JPG, progressive JPG, PNG, TGA, DDS DXT1/2/3/4/5, BMP, PSD, GIF, PVR2/3 (ETC1/PVRTC), KTX (ETC1/PVRTC), PKM (ETC1), HDR, PIC, PNM (PPM/PGM), CRN, PUG, FLIF, CCZ, EXR and vectorial SVG files (C++11)
C
134
star
10

kult

🔮 Lightweight entity/component/system library (C++11)
C++
127
star
11

dessert

🍰 Lightweight unit-testing framework (C++11).
C++
87
star
12

img2sky

🌒 A vertex-color mesh builder tool for skyboxes and static geometry, as seen in HomeWorld 2 .HOD files
C
76
star
13

dollar

💰 A portable CPU profiler with ASCII,CSV,TSV,Markdown,chrome://tracing support (C++11)
C++
64
star
14

base

Base91 / Base85 / Base64 de/encoders (C++03)
C++
61
star
15

trie

Trie is a lightweight and simple autocompletion data structure written in C++11.
C++
43
star
16

attila

🔥 Attila is a tiny atlas texture-packer (tool)
C++
40
star
17

quant

🍥 Tiny quantization suite supporting conversion to/from half-floats, s/unorm bytes, quaternions and vectors (C++03).
C++
39
star
18

wire

🔌 Wire is a drop-in std::string replacement with extended functionality and safe C/C++ formatters (C++11).
C++
39
star
19

tween

👯 Tween is a lightweight easing library. Written in C++03
C++
35
star
20

apathy

💾 Apathy is a lightweight path/file/mstream/mmap IO library (C++03)
C++
35
star
21

pug

🐶 pug, png with a twist: lossy image format
33
star
22

malloc-survey

📈 Allocation benchmarks
C++
30
star
23

mINI

▫️ A very minimal .INI reader/writer (C++11)
C++
29
star
24

id

:suspect: ID, a compile-time string hasher and sequential ID generator (C++11)
C++
27
star
25

LRU

A lightweight LRU cache structure for list<T> and map<K,V> containers. Written in C++11
C++
27
star
26

bubble

💬 A simple and lightweight C++11 dialog library (for Windows)
C++
25
star
27

knot

Knot is a lightweight and simple TCP networking C++ library with no dependencies.
C++
25
star
28

sqlight

🔦 SQLight is a lightweight MySQL client written in C++11. Based on code by Ladislav Nevery
C++
25
star
29

DrEcho

💊 Dr Echo spices your terminal up (C++11)
C++
24
star
30

cocoa

🍫 Cocoa is an uniform hashing library with no dependencies that provides interface for CRC32, CRC64, GCRC, RS, JS, PJW, ELF, BKDR, SBDM, DJB, DJB2, BP, FNV, FNV1a, AP, BJ1, MH2, SHA1, SFH (C++11)
C++
24
star
31

fmt11

Tiny format/mustache templating library (C++11)
C++
22
star
32

auth

Simple, lightweight and safe client-server authentication system. Written in C++
C++
22
star
33

bundler

📦 Command-line archiver: DEFLATE, LZMA, LZIP, BZIP2, ZPAQ, LZ4, ZSTD, BROTLI, BSC, CSC, BCM, MCM, ZMOLLY, ZLING, TANGELO, SHRINKER, CRUSH, LZJB and SHOCO streams in a ZIP file.
C++
21
star
34

moon9

a game framework. warning: wip, dev, unstable, radiation hazard, defcon 3
C++
21
star
35

assume

🙈 Assume is a smarter assert replacement (C++03)
C++
21
star
36

bourne

😏 Bourne is a lightweight JSON de/serializer (C++11).
C++
21
star
37

unify

🔗 A C++11 function to normalize resource identificators
C++
20
star
38

collage

✂️ A lightweight C++ library to diff and patch arbitrary data
C++
20
star
39

metrics

📊 Pretty table metrics w/ benchmarking, unit conversions in CSV,TSV,ASCII,markdown (C++11)
C++
19
star
40

sand

⌛ Sand is a lightweight time controller (C++11)
C++
19
star
41

hertz

⌚ Hertz, simple framerate locker (C++11)
C++
18
star
42

frodo

💍 A lightweight ring dependency system (C++11)
C++
18
star
43

giant

🗿 Giant is a tiny C++11 library to handle little/big endianness.
C++
18
star
44

vle

Simple variable-length encoder/decoder (C99)(C++03)
C++
15
star
45

heal

💉 Heal is a lightweight C++ library to aid and debug applications. Heal requires C++11 (or C++03 with boost at least).
C++
15
star
46

live

🎭 Automatic reloader of constants during runtime, featuring type inference. Written in C++11.
C++
15
star
47

crawler

Crawler, a quick prototiping platform for Windows (C++11)
C
14
star
48

units

fork of http://calumgrant.net/units (unavailable) + new units + c++0x support + markdown documentation (C++03)
C++
14
star
49

unifont

Embeddable console 1bpp font that supports many european/eastern unicode codepoints. Aimed to gamedev (C++11).
C
13
star
50

warp

♻️ A handy string interpolator (C++11)
C++
13
star
51

flow

🎈 Lightweight C++ network downloader with native fallbacks. Aimed to gamedev.
C++
13
star
52

burg

Simple burg linear predictor (C++11)
C++
13
star
53

oak

🌳 A simple and lightweight tree container (C++03)
C++
13
star
54

memo

📌 Simple and lightweight factory class, featuring automatic type casting. Written in C++11
C++
12
star
55

journey

🐫 Lightweight append-only, header-less, journaling file format (C++11)
C++
12
star
56

route66

⛽ Lightweight embeddable HTTP server. Written in C++11 (or C++03 w/boost).
C++
12
star
57

duty

Duty is a lightweight C++11 task manager for parallel coroutines and serial jobs. Duty requires no dependencies.
C++
10
star
58

unlzma

A very compact LZMA decoder (C++03)
C++
9
star
59

sentry

Sentry is a lightweight data monitor (C++11)
C++
9
star
60

vitae

📄 My self-compilable C++ resume
C++
9
star
61

codex

Codex is a lightweight and simple C++ library to escape, unescape, read, write and convert from/to different encoding charsets. Codex requires no dependencies.
C++
8
star
62

flare

✨ Lightweight C++ API to deal with digital signals/logical buttons (C++03).
C++
7
star
63

solace-wip

Solace is a modern console replacement. Written in C++03
C++
6
star
64

error64

Handle custom 64-bit error codes, with extended meta-info (C, C++)
C
6
star
65

fortfont

💱 Collection of western, CJK and iconographic fonts free for commercial usage
CSS
5
star
66

cash-of-clans

a free re-implementation of a working game economy system
C++
5
star
67

gpulib

C++
4
star
68

rgb332

custom uniform RGB332 palette
C++
4
star
69

hyde

🎮 Hyde is a lightweight and simple Human Inferface Device (HID) C++ framework with no dependencies.
C++
4
star
70

bridge

A standard C++11 <--> C++03/boost compatibility layer, plus a few utils
C++
3
star
71

pitch

📒 A forkable game pitch template (CC0, Markdown)
3
star
72

unordered_map

Portable header for std::unordered_map<K,V> template
C++
3
star
73

variant

Variant is a varying C++11 class that clones javascript behaviour as much as possible
C++
2
star
74

emojis

:octocat: Emojis, atlased and indexed
2
star
75

CLDR

Compact data from the Unicode Common Locale Data Repository
C++
2
star
76

blender-wip

An agnostic keyframe interpolation and animation controller. Written in C++11.
C++
2
star
77

atom-wip

A small library to multiplex, to interweave, to sort, to split and to join (sub)streams of binary data (C++11).
C++
2
star
78

depot

asset depository for all my repos
C++
2
star
79

vision

a few assorted premises and thoughts
2
star
80

jabba-wip

jabba, the hud engine. yet another generic hud/ui engine
C++
2
star
81

JXMLex

JXMLex description format
1
star
82

watchmen-wip

Dependency system for your libraries of game assets in run-time
C++
1
star
83

JXML

JXML is a loss-less representation of JSON in XML, so data can be reused with XML tools.
1
star
84

rescle

Automatically exported from code.google.com/p/rescle
C++
1
star
85

tint-wip

Colorful logging tool
C++
1
star
86

stringbase

Stringbase is a collaborative effort aimed to translate common texts found in videogames and regular apps. Free to use. Commits welcome!
1
star