• Stars
    star
    141
  • Rank 250,874 (Top 6 %)
  • Language
    C++
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The project provides high-performance concurrency, enabling highly parallel computation.

CircleCI

Dispenso

Introduction

Latin: To dispense, distribute, manage

Dispenso is a library for working with sets of tasks in parallel. It provides mechanisms for thread pools, task sets, parallel for loops, futures, pipelines, and more. Dispenso is a well-tested C++14 library designed to have minimal dependencies (some dependencies are required for the tests and benchmarks), and designed to be clean with compiler sanitizers (ASAN, TSAN). Dispenso is currently being used in dozens of projects and hundreds of C++ files at Meta (formerly Facebook). Dispenso also aims to avoid major disruption at every release. Releases will be made such that major versions are created when a backward incompatibility is introduced, and minor versions are created when substantial features have been added or bugs have been fixed, and the aim would be to only very rarely bump major versions. That should make the project suitable for use from main branch, or if you need a harder requirement, you can base code on a specific version.

Dispenso has the following features

  • AsyncRequest: Asynchronous request/response facilities for lightweight constrained message passing
  • CompletionEvent: A notifiable event type with wait and timed wait
  • ConcurrentObjectArena: An object arena for fast allocation of objects of the same type
  • ConcurrentVector: A vector-like type with a superset of the TBB concurrent_vector API
  • for_each: Parallel version of std::for_each and std::for_each_n
  • Future: A futures implementation that strives for interface similarity with std::experimental::future, but with dispenso types as backing thread pools
  • OnceFunction: A lightweight function-like interface for void() functions that can only be called once
  • parallel_for: Parallel for loops over indices that can be blocking or non-blocking
  • pipeline: Parallel pipelining of workloads
  • PoolAllocator: A pool allocator with facilities to supply a backing allocation/deallocation, making this suitable for use with e.g. CUDA allocation
  • ResourcePool: A type that acts similar to a semaphore around guarded objects
  • RWLock: A minimal reader-writer spin lock that outperforms std::shared_mutex under low write contention
  • SmallBufferAllocator: An allocator that enables fast concurrent allocation for temporary objects
  • TaskSet: Sets of tasks that can be waited on together
  • ThreadPool: The backing thread pool type used by many other dispenso features

Comparison of dispenso vs other libraries

TBB

TBB has significant overlap with dispenso, though TBB has more functionality, and is likely to continue having more utilities for some time. We chose to build and use dispenso for a few primary reasons like

  1. TBB is built on older C++ standards, and doesn't deal well with compiler sanitizers
  2. TBB lacks an interface for futures
  3. We wanted to ensure we could control performance and availability on non-Intel hardware

Dispenso is faster than TBB in some scenarios and slower in other scenarios. For example, with parallel for loops, dispenso tends to be faster for small and medium loops, and on-par with TBB for large loops. When many loops can run independently of one another, dispenso shines and can perform significantly better than TBB. Anecdotally speaking, we have seen one workload with independent parallel for loops at Meta where porting to dispenso lead to a 50% speedup.

OpenMP

OpenMP has very simple semantics for parallelizing simple for loops, but gets quite complex for more complicated loops and constructs. OpenMP wasn't as portable in the past, though the number of compiler supporting it is increasing. If not used carefully, nesting of OpenMP constructs inside of other threads (e.g. nested parallel for) can lead to large number of threads, which can exhaust machines.

Performance-wise, dispenso tends to outperform simple OpenMP for loops for medium and large workloads, but OpenMP has a significant advantage for small loops. This is because it has direct compiler support and can understand the cost of the code it is running. This allows it to forgo running in parallel if the tradeoffs aren't worthwhile.

Folly

Folly is a library from Meta that has several concurrency utilities including thread pools and futures. The library has very good support for new C++ coroutines functionality, and makes writing asynchronous code (e.g. I/O) easy and performant. Folly as a library can be tricky to work with. For example, the forward/backward compatibility of code isn't a specific goal of the project.

Folly does not have a parallel loop concept, nor task sets and parallel pipelines. When comparing Folly's futures against dispenso's, dispenso tries to maintain an API that is closely matched to a combination of std::experimental::future and std::experimental::shared_future (dispenso's futures are all shared). Additionally, for compute-bound applications, dispenso's futures tend to be much faster and lighter-weight than Folly's.

Grand central dispatch, new std C++ parallelism, others

We haven't done a strong comparison vs these other mechanisms. GCD is an Apple technology used by many people for Mac and iOS platforms, and there are ports to other platforms (though the mechanism for submitting closures is different). Much of the C++ parallel algorithms work is still TBD, but we would be very interested to enable dispenso to be a basis for parallelization of those algorithms. Additionally, we have interest in enabling dispenso to back the new coroutines interface. We'd be interested in any contributions people would like to make around benchmarking/summarizing other task parallelism libraries, and also integration with C++ parallel algorithms and coroutines.

When (currently) not to use dispenso

Dispenso isn't really designed for high-latency task offload, it works best for compute-bound tasks. Using the thread pool for networking, disk, or in cases with frequent TLB misses (really any scenario with kernel context switches) may result in less than ideal performance.

In these kernel context switch scenarios, dispenso::Future can be used with dispeno::NewThreadInvoker, which should be roughly equivalent with std::future performance.

If you need async I/O, Folly is likely a good choice (though it still doesn't fix e.g. TLB misses).

Documentation and Examples

Documentation can be found here

Here are some simple examples of what you can do in dispenso. See tests and benchmarks for more examples.

parallel_for

for(size_t j = 0; j < kLoops; ++j) {
  vec[j] = someFunction(j);
}

Becomes

dispenso::parallel_for(0, kLoops, [&vec] (size_t j) {
  vec[j] = someFunction(j);
});

TaskSet

void randomWorkConcurrently() {
  dispenso::TaskSet tasks(dispenso::globalThreadPool());
  tasks.schedule([&stateA]() { stateA = doA(); });
  tasks.schedule([]() { doB(); });
  // Do some work on current thread
  tasks.wait(); // After this, A, B done.
  tasks.schedule(doC);
  tasks.schedule([&stateD]() { doD(stateD); });
} // TaskSet's destructor waits for all scheduled tasks to finish

ConcurrentTaskSet

struct Node {
  int val;
  std::unique_ptr<Node> left, right;
};
void buildTree(dispenso::ConcurrentTaskSet& tasks, std::unique_ptr<Node>& node, int depth) {
  if (depth) {
    node = std::make_unique<Node>();
    node->val = depth;
    tasks.schedule([&tasks, &left = node->left, depth]() { buildTree(tasks, left, depth - 1); });
    tasks.schedule([&tasks, &right = node->right, depth]() { buildTree(tasks, right, depth - 1); });
  }
}
void buildTreeParallel() {
  std::unique_ptr<Node> root;
  dispenso::ConcurrentTaskSet tasks(dispenso::globalThreadPool());
  buildTree(tasks, root, 20);
  tasks.wait();  // tasks would also wait here in destructor if we omitted this line
}

Future

dispenso::Future<size_t> ThingProcessor::processThings() {
  auto expensiveFuture = dispenso::async([this]() {
    return processExpensiveThing(expensive_);
  });
  auto futureOfManyCheap = dispenso::async([this]() {
    size_t sum = 0; 
    for (auto &thing : cheapThings_) {
      sum += processCheapThing(thing);
    }
    return sum;
  });
  return dispenso::when_all(expensiveFuture, futureOfManyCheap).then([](auto &&tuple) {
    return std::get<0>(tuple).get() + std::get<1>(tuple).get();
  });
}

auto result = thingProc->processThings();
useResult(result.get());

ConcurrentVector

ConcurrentVector<std::unique_ptr<int>> values;
dispenso::parallel_for(
  dispenso::makeChunkedRange(0, length, dispenso::ParForChunking::kStatic),
  [&values](int i, int end) {
    values.grow_by_generator(end - i, [i]() mutable { return std::make_unique<int>(i++); });
  });

Installing dispenso

Binary builds of dispenso are currently available for some Linux distributions, and can be installed using their respective package managers. If your distribution or platform is not on the list, see the next section for instructions to build it yourself.

Packaging status

Building dispenso

Install CMake

Internally to Meta, we use the Buck build system, but as that relies on a monorepo for relevant dependencies, we do not (yet) ship our BUCK build files. To enable easy use outside of Meta monorepos, we ship a CMake build. Improvements to the CMake build and build files for additional build systems are welcome, as are instructions for building on other platforms, including BSD variants, Windows+Clang, etc...

Fedora/RPM-based distros

sudo dnf install cmake

MacOS

brew install cmake

Windows

Install CMake from https://cmake.org/download/

Build dispenso

Linux and MacOS

  1. mkdir build && cd build
  2. cmake PATH_TO_DISPENSO_ROOT
  3. make -j

Windows

Install Build Tools for Visual Studio. All commands should be run from the Developer Command Prompt.

  1. mkdir build && cd build
  2. cmake PATH_TO_DISPENSO_ROOT
  3. cmake --build . --config Release

Install dispenso

Once built, the library can be installed by building the "install" target. Typically on Linux and MacOS, this is done with

make install

On Windows (and works on any platfrom), instead do

cmake --build . --target install

Use an installed dispenso

Once installed, a downstream CMake project can be pointed to it by using CMAKE_PREFIX_PATH or Dispenso_DIR, either as an environment variable or CMake variable. All that is required to use the library is link the imported CMake target Dispenso::dispenso, which might look like

find_package(Dispenso REQUIRED)
target_link_libraries(myDispensoApp Dispenso::dispenso)

This brings in all required include paths, library files to link, and any other properties to the myDispensoApp target (your library or application).

Building and running dispenso tests

To keep dependencies to an absolute minimum, we do not build tests or benchmarks by default, but only the core library. Building tests requires GoogleTest.

Build and run dispenso tests

Linux and MacOS

  1. mkdir build && cd build
  2. cmake PATH_TO_DISPENSO_ROOT -DDISPENSO_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release
  3. make -j
  4. ctest

Windows

All commands should be run from the Developer Command Prompt.

  1. mkdir build && cd build
  2. cmake PATH_TO_DISPENSO_ROOT -DDISPENSO_BUILD_TESTS=ON
  3. cmake --build . --config Release
  4. ctest

Building and running dispenso benchmarks

Dispenso has several benchmarks, and some of these can benchmark against OpenMP, TBB, and/or folly variants. If benchmarks are turned on via -DDISPENSO_BUILD_BENCHMARKS=ON, the build will attempt to find these libraries, and if found, will enable those variants in the benchmarks. It is important to note that none of these dependencies are dependencies of the dispenso library, but only the benchmark binaries.

The folly variant is turned off by default, because unfortunately it appears to be common to find build issues in many folly releases; note however that the folly code does run and provide benchmark data on our internal Meta platform.

OpenMP should already be available on most platforms that support it (it must be partially built into the compiler after all), but TBB can be had by e.g. sudo dnf install tbb-devel.

After you have the deps you want, you can build and run:

Linux and MacOS

  1. mkdir build && cd build
  2. cmake PATH_TO_DISPENSO_ROOT -DDISPENSO_BUILD_BENCHMARKS=ON -DCMAKE_BUILD_TYPE=Release
  3. make -j
  4. (e.g.) bin/once_function_benchmark

Windows

Not currently supported.

Benchmark Results

Here are some limited benchmark results. Unless otherwise noted, these were run on a dual Epyc Rome machine with 128 cores and 256 threads. One benchmark here was repeated on a Threadripper 2990WX with 32 cores and 64 threads.

Some additional notes about the benchmarks: Your mileage may vary based on compiler, OS/platform, and processor. These benchmarks were run with default glibc malloc, but use of tcmalloc or jemalloc can significantly boost performance, especially for ConcurrentVector growth operations (grow_by and push_back).

plot


plot plot plot


plot plot


plot

Known issues

None at present

TODO

  • Expand CircleCI continuous integration testing from Linux to include Mac and Windows, and also to run on ARM. Use TSAN and ASAN testing on available platforms.

License

The library is released under the MIT license, but also relies on the (excellent) moodycamel concurrentqueue library, which is released under the Simplified BSD and Zlib licenses. See the top of the source at dispenso/third-party/moodycamel/*.h for details.

More Repositories

1

SocketRocket

A conforming Objective-C WebSocket client library.
Objective-C
9,524
star
2

katran

A high performance layer 4 load balancer
C
4,488
star
3

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python
4,418
star
4

cinder

Cinder is Meta's internal performance-oriented production version of CPython.
Python
3,349
star
5

velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
C++
3,138
star
6

spectrum

A client-side image transcoding library.
C++
1,985
star
7

FBX2glTF

A command-line tool for the conversion of 3D model assets on the FBX file format to the glTF file format.
C++
1,963
star
8

oomd

A userspace out-of-memory killer
C++
1,745
star
9

xar

executable archive format
Python
1,578
star
10

fastmod

A fast partial replacement for the codemod tool
Rust
1,570
star
11

Bowler

Safe code refactoring for modern Python.
Python
1,506
star
12

gloo

Collective communications library with various primitives for multi-machine training.
C++
1,128
star
13

fizz

C++14 implementation of the TLS-1.3 standard
C++
1,104
star
14

submitit

Python 3.8+ toolbox for submitting jobs to Slurm
Python
1,075
star
15

dhcplb

dhcplb is Facebook's implementation of a load balancer for DHCP.
Go
1,035
star
16

below

A time traveling resource monitor for modern Linux systems
Rust
975
star
17

OnlineSchemaChange

A tool for performing online schema changes on MySQL.
Python
951
star
18

Glean

System for collecting, deriving and working with facts about source code.
Hack
886
star
19

Battery-Metrics

Library that helps in instrumenting battery related system metrics.
Java
720
star
20

retrie

Retrie is a powerful, easy-to-use codemodding tool for Haskell.
Haskell
490
star
21

superconsole

The superconsole crate provides a handler and building blocks for powerful, yet minimally intrusive TUIs. It is cross platform, supporting Windows 7+, Linux, and MacOS. Rustaceans who want to create non-interactive TUIs can use the component composition building block system to quickly deploy their code.
Rust
447
star
22

nvdtools

A set of tools to work with the feeds (vulnerabilities, CPE dictionary etc.) distributed by National Vulnerability Database (NVD)
Go
428
star
23

infima

A UI framework that provides websites with the minimal CSS and JS needed to get started with building a modern responsive beautiful website
HTML
393
star
24

CG-SQL

CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.
HTML
385
star
25

flowtorch

This library would form a permanent home for reusable components for deep probabilistic programming. The library would form and harness a community of users and contributors by focusing initially on complete infra and documentation for how to use and create components.
Jupyter Notebook
297
star
26

ptr

Python Test Runner.
Python
285
star
27

TTPForge

The TTPForge is a Cybersecurity Framework for developing, automating, and executing attacker Tactics, Techniques, and Procedures (TTPs).
Go
280
star
28

fbjni

A library designed to simplify the usage of the Java Native Interface
C++
245
star
29

senpai

Senpai is an automated memory sizing tool for container applications.
Python
213
star
30

gazebo

A Rust library containing a collection of small well-tested primitives.
Rust
210
star
31

dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
C++
161
star
32

reindeer

Reindeer is a tool to transform Rust Cargo dependencies into generated Buck build rules
Rust
157
star
33

FCR

FBNet-Command-Runner: A thrift service to run commands on heterogeneous Network devices with configurable parameters.
Python
154
star
34

GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
R
149
star
35

oculus-linux-kernel

The Linux kernel code for Oculus devices
C
148
star
36

hsthrift

The Haskell Thrift Compiler. This is an implementation of the Thrift spec that generates code in Haskell. It depends on the fbthrift project for the implementation of the underlying transport.
Haskell
143
star
37

FioSynth

Tool which enables the creation of synthetic storage workloads, automates the execution and results collection of synthetic storage benchmarks.
Python
136
star
38

dataclassgenerate

DataClassGenerate (or simply DCG) is a Kotlin compiler plugin that addresses an Android APK size overhead from Kotlin data classes.
Kotlin
134
star
39

meta-code-verify

Code Verify is an open source web browser extension that confirms that your Facebook, Messenger, Instagram, and WhatsApp Web code hasn’t been tampered with or altered, and that the Web experience you’re getting is the same as everyone else’s.
TypeScript
133
star
40

go-qfext

a fast counting quotient filter implementation in golang
Go
88
star
41

tacquito

Tacquito is an open source TACACs+ server written in Go that implements RFC8907
Go
82
star
42

dcrpm

A tool to detect and correct common issues around RPM database corruption.
Python
72
star
43

ForgeArmory

ForgeArmory provides TTPs that can be used with the TTPForge (https://github.com/facebookincubator/ttpforge).
Swift
67
star
44

antlir

ANother Linux Image buildeR
Rust
63
star
45

ConversionsAPI-Tag-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Please follow the instructions https://www.facebook.com/business/help/702509907046774
Smarty
63
star
46

InjKit

Injection Kit. It is a java bytecode processing library for bytecode injection and transformation.
Java
56
star
47

sks

Secure Key Storage (SKS) is a library for Go that abstracts Security Hardware on laptops.
Go
55
star
48

obs-plugins

OBS Plugins
C++
54
star
49

glTFVariantMeld

An application that accepts files on the glTF format, interprets them as variants of an over-arching whole, and melds them together.
Rust
47
star
50

later

A framework for python asyncio with batteries included for people writing services in python asyncio
Python
38
star
51

go2chef

A Golang tool to bootstrap a system from zero so that it's able to run Chef to be managed
Go
38
star
52

ConversionsAPI-Client-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Primarily we will be hosting, - ConversionsAPI(Facebook) Client - listens on the events fired to GTM Server and maps them to common GTM schema. - ConversionsAPI(Facebook) Tag - server tag that fires events to CAPI.For more details on Design here https//fburl.com/uae68vlr
37
star
53

CommutingZones

Commuting zones are geographic areas where people live and work and are useful for understanding local economies, as well as how they differ from traditional boundaries. These zones are a set of boundary shapes built using aggregated estimates of home and work locations. Data used to build commuting zones is aggregated and de-identified.
JavaScript
37
star
54

Facebook-Pixel-for-Wordpress

A plugin for advertisers who use Wordpress to enable them easily setup the facebook pixel.
JavaScript
34
star
55

wordpress-messenger-customer-chat-plugin

Messenger Customer Chat Plugin for WordPress
PHP
26
star
56

CP4M

CP4M is a conversational marketing platform which enables advertisers to integrate their customer-facing chatbots with FB Messenger/WhatsApp, in order to meet customers where they are and drive native conversations on the advertiser's owned infra.
Java
26
star
57

rush

RUSH (Reliable - unreliable - Streaming Protocol)
C++
22
star
58

buck2-change-detector

Given a Buck2 built project and a set of changes (e.g. from source control) compute the targets that may have changed. Sometimes known as a target determinator, useful for optimizing a CI system.
Rust
18
star
59

MY_ENUM

Small c++ macro library to add compile-time introspection to c++ enum classes.
C++
15
star
60

spark-ar-core-libs

Core libraries that can be used in Spark AR. You can import each library depends on your requirements.
TypeScript
15
star
61

SafeC

Library containing safer alternatives/wrappers for insecure C APIs.
C++
14
star
62

go-belt

It is an implementation-agnostic Go(lang) package to generalize observability tooling (logger, metrics, tracer and so on) and provide ability to use any of these tools with a standard context. Essentially it is an attempt to standardize observability API in Go.
Go
14
star
63

Portal-Kernel

Kernel Code for Portal.
C
11
star
64

sado

A macOS signed-app shim for running daemons with reliable capabilities.
Swift
10
star
65

npe-toolkit

Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.
TypeScript
9
star
66

Eigen-FBPlugins

This is collection of plugins extending Eigen arrays/matrices with main focus on using them for computer vision. In particular, this project should provide support for multichannel arrays (missing in vanilla Eigen) and seamless integration between Eigen types and OpenCV functions.
C++
8
star
67

isometric_pattern_matcher

A new isometric calibration pattern - which should/might lead to higher accuracy calibrations compared to existing solutions (checkerboards, patterns of circles).
C++
8
star
68

dnf-plugin-cow

Code to enable Copy on Write features being upstreamed in rpm and librepo
Shell
8
star
69

wireguard_py

Cython library for Wireguard
C
6
star
70

strobelight

Meta's fleetwide profiler framework
6
star
71

jupyterhub_fb_authenticator

JupyterHub Facebook Authenticator is a Facebook OAuth authenticator built on top of OAuthenticator.
Python
5
star
72

meta-fbvuln

OpenEmbedded meta-layer that allows producing a vulnerability manifest alongside a Yocto build. The produced manifest is suitable for ongoing vulnerability scanning of fielded software.
5
star
73

gazebo_lint

A Rust linter that provides various suggestions based on the new primitives offered in the `gazebo` library.
Rust
4
star
74

kernel-patches-daemon

Sync Patchwork series's with Github pull requests
Python
4
star
75

scrut

Scrut is a testing toolkit for CLI applications. A tool to scrutinize terminal programs without fuss.
Rust
4
star
76

language-capirca

Adds syntax highlighting for Capirca filetypes in Atom. Capirca is an open source standard for writing vendor-neutral firewall policies as originally released by Google: https://github.com/google/capirca
3
star
77

fbc_owrt_feed

Facebook Connectivity OpenWrt Feed. Package feed for OpenWrt router OS by Facebook Connectivity programme.
Lua
2
star
78

cutlass-fork

A Meta fork of NV CUTLASS repo.
C++
2
star