• Stars
    star
    3,458
  • Rank 12,892 (Top 0.3 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cinder is Meta's internal performance-oriented production version of CPython.

Cinder Logo

Cinder Logo

Support Ukraine - Help Provide Humanitarian Aid to Ukraine. Cinder build status on GitHub Actions

Welcome to Cinder!

Cinder is Meta's internal performance-oriented production version of CPython 3.10. It contains a number of performance optimizations, including bytecode inline caching, eager evaluation of coroutines, a method-at-a-time JIT, and an experimental bytecode compiler that uses type annotations to emit type-specialized bytecode that performs better in the JIT.

Cinder is powering Instagram, where it started, and is increasingly used across more and more Python applications in Meta.

For more information on CPython, see README.cpython.rst.

Is this supported?

Short answer: no.

We've made Cinder publicly available in order to facilitate conversation about potentially upstreaming some of this work to CPython and to reduce duplication of effort among people working on CPython performance.

Cinder is not polished or documented for anyone else's use. We don't have the desire for it to become an alternative to CPython. Our goal in making this code available is a unified faster CPython. So while we do run Cinder in production, if you choose to do so you are on your own. We can't commit to fixing external bug reports or reviewing pull requests. We make sure Cinder is sufficiently stable and fast for our production workloads, but we make no assurances about its stability or correctness or performance for any external workloads or use-cases.

That said, if you have experience in dynamic language runtimes and have ideas to make Cinder faster; or if you work on CPython and want to use Cinder as inspiration for improvements in CPython (or help upstream parts of Cinder to CPython), please reach out; we'd love to chat!

How do I build it?

Cinder should build just like CPython; configure and make -j. However as most development and usage of Cinder occurs in the highly specific context of Meta we do not exercise it much in other environments. As such, the most reliable way to build and run Cinder is to re-use the Docker-based setup from our GitHub CI workflow.

If you just want to get a working Cinder without building it yourself, our Runtime Docker Image is going to be the easiest (no repo clone needed!):

  1. Install and setup Docker.
  2. Fetch and run our cinder-runtime image:
    docker run -it --rm ghcr.io/facebookincubator/cinder-runtime:cinder-3.10

If you want to build it yourself:

  1. Install and setup Docker.
  2. Clone the Cinder repo:
    git clone https://github.com/facebookincubator/cinder
  3. Run a shell in the Docker environment used by the CI:
    docker run -v "$PWD/cinder:/vol" -w /vol -it --rm ghcr.io/facebookincubator/cinder/python-build-env:latest bash
    The above command does the following:
    • Downloads (if not already cached) a pre-built Docker image used by the CI from https://ghcr.io/facebookincubator/cinder/python-build-env.
    • Makes the Cinder checkout above ($PWD/cinder) available to the Docker environment at the mount point /vol.
    • Interactively (-it) runs bash in the /vol directory.
    • Cleanup the local image after it's finished (--rm) to avoid disk bloat.
  4. Build Cinder from the shell started the Docker environment:
    ./configure && make

Please be aware that Cinder is only built or tested on Linux x64; anything else (including macOS) probably won't work. The Docker image above is Fedora Linux-based and built from a Docker spec file in the Cinder repo: .github/workflows/python-build-env/Dockerfile.

There are some new test targets that might be interesting. make testcinder is pretty much the same as make test except that it skips a few tests that are problematic in our dev environment. make testcinder_jit runs the test suite with the JIT fully enabled, so all functions are JIT'ed. make testruntime runs a suite of C++ gtest unit tests for the JIT. And make test_strict_module runs a test suite for strict modules (see below).

Note that these steps produce a Cinder Python binary without PGO/LTO optimizations enabled, so don't expect to use these instructions to get any speedup on any Python workload.

How do I explore it?

Cinder Explorer is a live playground, where you can see how Cinder compiles Python code from source to assembly -- you're welcome to try it out! Feel free to file feature requests and bug reports. Keep in mind that the Cinder Explorer, like the rest of this, "supported" on a best-effort basis.

What's here?

Immortal Instances

Instagram uses a multi-process webserver architecture; the parent process starts, performs initialization work (e.g. loading code), and forks tens of worker processes to handle client requests. Worker processes are restarted periodically for a number of reasons (e.g. memory leaks, code deployments) and have a relatively short lifetime. In this model, the OS must copy the entire page containing an object that was allocated in the parent process when the object's reference count is modified. In practice, the objects allocated in the parent process outlive workers; all the work related to reference counting them is unnecessary.

Instagram has a very large Python codebase and the overhead due to copy-on-write from reference counting long-lived objects turned out to be significant. We developed a solution called "immortal instances" to provide a way to opt-out objects from reference counting. See Include/object.h for details. This feature is controlled by defining Py_IMMORTAL_INSTANCES and is enabled by default in Cinder. This was a large win for us in production (~5%), but it makes straight-line code slower. Reference counting operations occur frequently and must check whether or not an object participates in reference counting when this feature is enabled.

Shadowcode

"Shadowcode" or "shadow bytecode" is our implementation of a specializing interpreter. It observes particular optimizable cases in the execution of generic Python opcodes and (for hot functions) dynamically replaces those opcodes with specialized versions. The core of shadowcode lives in Python/shadowcode.c, though the implementations for the specialized bytecodes are in Python/ceval.c with the rest of the eval loop. Shadowcode-specific tests are in Lib/test/test_shadowcode.py.

It is similar in spirit to the specializing adaptive interpreter (PEP-659) that will be built into CPython 3.11.

Await-aware function calls

The Instagram Server is an async-heavy workload, where each web request may trigger hundreds of thousands of async tasks, many of which can be completed without suspension (e.g. thanks to memoized values).

We extended the vectorcall protocol to pass a new flag, Ci_Py_AWAITED_CALL_MARKER, indicating the caller is immediately awaiting this call.

When used with async function calls that are immediately awaited, we can immediately (eagerly) evaluate the called function, up to completion, or up to its first suspension. If the function completes without suspending, we are able to return the value immediately, with no extra heap allocations.

When used with async gather, we can immediately (eagerly) evaluate the set of passed awaitables, potentially avoiding the cost of creation and scheduling of multiple tasks for coroutines that could be completed synchronously, completed futures, memoized values, etc.

These optimizations resulted in a significant (~5%) CPU efficiency improvement.

This is mostly implemented in Python/ceval.c, via a new vectorcall flag Ci_Py_AWAITED_CALL_MARKER, indicating the caller is immediately awaiting this call. Look for uses of the IS_AWAITED() macro and this vectorcall flag.

The Cinder JIT

The Cinder JIT is a method-at-a-time custom JIT implemented in C++. It is enabled via the -X jit flag or the PYTHONJIT=1 environment variable. It supports almost all Python opcodes, and can achieve 1.5-4x speed improvements on many Python performance benchmarks.

By default when enabled it will JIT-compile every function that is ever called, which may well make your program slower, not faster, due to overhead of JIT-compiling rarely-called functions. The option -X jit-list-file=/path/to/jitlist.txt or PYTHONJITLISTFILE=/path/to/jitlist.txt can point it to a text file containing fully qualified function names (in the form path.to.module:funcname or path.to.module:ClassName.method_name), one per line, which should be JIT-compiled. We use this option to compile only a set of hot functions derived from production profiling data. (A more typical approach for a JIT would be to dynamically compile functions as they are observed to be called frequently. It hasn't yet been worth it for us to implement this, since our production architecture is a pre-fork webserver, and for memory sharing reasons we wish to do all of our JIT compiling up front in the initial process before workers are forked, which means we can't observe the workload in-process before deciding which functions to JIT-compile.)

The JIT lives in the Jit/ directory, and its C++ tests live in RuntimeTests/ (run these with make testruntime). There are also some Python tests for it in Lib/test/test_cinderjit.py; these aren't meant to be exhaustive, since we run the entire CPython test suite under the JIT via make testcinder_jit; they cover JIT edge cases not otherwise found in the CPython test suite.

See Jit/pyjit.cpp for some other -X options and environment variables that influence the behavior of the JIT. There is also a cinderjit module defined in that file which exposes some JIT utilities to Python code (e.g. forcing a specific function to compile, checking if a function is compiled, disabling the JIT). Note that cinderjit.disable() only disables future compilation; it immediately compiles all known functions and keeps existing JIT-compiled functions.

The JIT first lowers Python bytecode to a high-level intermediate representation (HIR); this is implemented in Jit/hir/. HIR maps reasonably closely to Python bytecode, though it is a register machine instead of a stack machine, it is a bit lower level, it is typed, and some details that are obscured by Python bytecode but important for performance (notably reference counting) are exposed explicitly in HIR. HIR is transformed into SSA form, some optimization passes are performed on it, and then reference counting operations are automatically inserted into it according to metadata about the refcount and memory effects of HIR opcodes.

HIR is then lowered to a low-level intermediate representation (LIR), which is an abstraction over assembly, implemented in Jit/lir/. In LIR we do register allocation, some additional optimization passes, and then finally LIR is lowered to assembly (in Jit/codegen/) using the excellent asmjit library.

The JIT is in its early stages. While it can already eliminate interpreter loop overhead and offers significant performance improvements for many functions, we've only begun to scratch the surface of possible optimizations. Many common compiler optimizations are not yet implemented. Our prioritization of optimizations is largely driven by the characteristics of the Instagram production workload.

Strict Modules

Strict modules is a few things rolled into one:

1. A static analyzer capable of validating that executing a module's top-level code will not have side effects visible outside that module.

2. An immutable StrictModule type usable in place of Python's default module type.

3. A Python module loader capable of recognizing modules opted in to strict mode (via an import __strict__ at the top of the module), analyzing them to validate no import side effects, and populating them in sys.modules as a StrictModule object.

Static Python

Static Python is a bytecode compiler that makes use of type annotations to emit type-specialized and type-checked Python bytecode. Used along with the Cinder JIT, it can deliver performance similar to MyPyC or Cython in many cases, while offering a pure-Python developer experience (normal Python syntax, no extra compilation step). Static Python plus Cinder JIT achieves 18x the performance of stock CPython on a typed version of the Richards benchmark. At Instagram we have successfully used Static Python in production to replace all Cython modules in our primary webserver codebase, with no performance regression.

The Static Python compiler is built on top of the Python compiler module that was removed from the standard library in Python 3 and has since been maintained and updated externally; this compiler is incorporated into Cinder in Lib/compiler. The Static Python compiler is implemented in Lib/compiler/static/, and its tests are in Lib/test/test_compiler/test_static.py.

Classes defined in Static Python modules are automatically given typed slots (based on inspection of their typed class attributes and annotated assignments in __init__), and attribute loads and stores against instances of these types use new STORE_FIELD and LOAD_FIELD opcodes, which in the JIT become direct loads/stores from/to a fixed memory offset in the object, with none of the indirection of a LOAD_ATTR or STORE_ATTR. Classes also gain vtables of their methods, for use by the INVOKE_* opcodes mentioned below. The runtime support for these features is located in Include/classloader.h and Python/classloader.c.

A static Python function begins with a new CHECK_ARGS opcode which checks that the supplied arguments' types match the type annotations, and raises TypeError if not. Calls from a static Python function to another static Python function will skip this opcode (since the types are already validated by the compiler). Static to static calls can also avoid much of the overhead of a typical Python function call. We emit an INVOKE_FUNCTION or INVOKE_METHOD opcode which carries with it metadata about the called function or method; this plus optionally immutable modules (via StrictModule) and types (via cinder.freeze_type(), which we currently apply to all types in strict and static modules in our import loader, but in future may become an inherent part of Static Python) and compile-time knowledge of the callee signature allow us to (in the JIT) turn many Python function calls into direct calls to a fixed memory address using the x64 calling convention, with little more overhead than a C function call.

Static Python is still gradually typed, and supports code that is only partially annotated or uses unknown types by falling back to normal Python dynamic behavior. In some cases (e.g. when a value of statically-unknown type is returned from a function with a return annotation), a runtime CAST opcode is inserted which will raise TypeError if the runtime type does not match the expected type.

Static Python also supports new types for machine integers, bools, doubles, and vectors/arrays. In the JIT these are handled as unboxed values, and e.g. primitive integer arithmetic avoids all Python overhead. Some operations on builtin types (e.g. list or dictionary subscript or len()) are also optimized.

Cinder supports gradual adoption of static modules via a strict/static module loader that can automatically detect static modules and load them as static with cross-module compilation. The loader will look for import __static__ and import __strict__ annotations at the top of a file, and compile modules appropriately. To enable the loader, you have one of three options:

1. Explicitly install the loader at the top level of your application via from compiler.strict.loader import install; install().

  1. Set PYTHONINSTALLSTRICTLOADER=1 in your env.
  2. Run ./python -X install-strict-loader application.py.

Alternatively, you can compile all code statically by using ./python -m compiler --static some_module.py, which will compile the module as static Python and execute it.

See CinderDoc/static_python.rst for more detailed documentation.

More Repositories

1

SocketRocket

A conforming Objective-C WebSocket client library.
Objective-C
9,534
star
2

katran

A high performance layer 4 load balancer
C
4,674
star
3

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python
4,515
star
4

velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
C++
3,474
star
5

FBX2glTF

A command-line tool for the conversion of 3D model assets on the FBX file format to the glTF file format.
C++
2,060
star
6

spectrum

A client-side image transcoding library.
C++
1,987
star
7

oomd

A userspace out-of-memory killer
C++
1,795
star
8

fastmod

A fast partial replacement for the codemod tool
Rust
1,648
star
9

xar

executable archive format
Python
1,571
star
10

Bowler

Safe code refactoring for modern Python.
Python
1,532
star
11

submitit

Python 3.8+ toolbox for submitting jobs to Slurm
Python
1,245
star
12

gloo

Collective communications library with various primitives for multi-machine training.
C++
1,181
star
13

fizz

C++14 implementation of the TLS-1.3 standard
C++
1,128
star
14

dhcplb

dhcplb is Facebook's implementation of a load balancer for DHCP.
Go
1,046
star
15

below

A time traveling resource monitor for modern Linux systems
Rust
1,029
star
16

OnlineSchemaChange

A tool for performing online schema changes on MySQL.
Python
965
star
17

Glean

System for collecting, deriving and working with facts about source code.
Hack
923
star
18

Battery-Metrics

Library that helps in instrumenting battery related system metrics.
Java
736
star
19

retrie

Retrie is a powerful, easy-to-use codemodding tool for Haskell.
Haskell
500
star
20

superconsole

The superconsole crate provides a handler and building blocks for powerful, yet minimally intrusive TUIs. It is cross platform, supporting Windows 7+, Linux, and MacOS. Rustaceans who want to create non-interactive TUIs can use the component composition building block system to quickly deploy their code.
Rust
477
star
21

nvdtools

A set of tools to work with the feeds (vulnerabilities, CPE dictionary etc.) distributed by National Vulnerability Database (NVD)
Go
446
star
22

nimble

New file format for storage of large columnar datasets.
C++
419
star
23

infima

A UI framework that provides websites with the minimal CSS and JS needed to get started with building a modern responsive beautiful website
HTML
405
star
24

CG-SQL

CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.
HTML
391
star
25

TTPForge

The TTPForge is a Cybersecurity Framework for developing, automating, and executing attacker Tactics, Techniques, and Procedures (TTPs).
Go
320
star
26

flowtorch

This library would form a permanent home for reusable components for deep probabilistic programming. The library would form and harness a community of users and contributors by focusing initially on complete infra and documentation for how to use and create components.
Jupyter Notebook
300
star
27

ptr

Python Test Runner.
Python
284
star
28

senpai

Senpai is an automated memory sizing tool for container applications.
Python
270
star
29

fbjni

A library designed to simplify the usage of the Java Native Interface
C++
260
star
30

dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
C++
251
star
31

gazebo

A Rust library containing a collection of small well-tested primitives.
Rust
235
star
32

reindeer

Reindeer is a tool to transform Rust Cargo dependencies into generated Buck build rules
Rust
177
star
33

dispenso

The project provides high-performance concurrency, enabling highly parallel computation.
C++
174
star
34

GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
R
171
star
35

oculus-linux-kernel

The Linux kernel code for Oculus devices
C
156
star
36

FCR

FBNet-Command-Runner: A thrift service to run commands on heterogeneous Network devices with configurable parameters.
154
star
37

dataclassgenerate

DataClassGenerate (or simply DCG) is a Kotlin compiler plugin that addresses an Android APK size overhead from Kotlin data classes.
Kotlin
154
star
38

hsthrift

The Haskell Thrift Compiler. This is an implementation of the Thrift spec that generates code in Haskell. It depends on the fbthrift project for the implementation of the underlying transport.
Haskell
148
star
39

FioSynth

Tool which enables the creation of synthetic storage workloads, automates the execution and results collection of synthetic storage benchmarks.
Python
140
star
40

meta-code-verify

Code Verify is an open source web browser extension that confirms that your Facebook, Messenger, Instagram, and WhatsApp Web code hasn’t been tampered with or altered, and that the Web experience you’re getting is the same as everyone else’s.
TypeScript
137
star
41

tacquito

Tacquito is an open source TACACs+ server written in Go that implements RFC8907
Go
93
star
42

go-qfext

a fast counting quotient filter implementation in golang
Go
91
star
43

momentum

A library for human kinematic motion and numerical optimization solvers to apply human motion
C++
89
star
44

ForgeArmory

ForgeArmory provides TTPs that can be used with the TTPForge (https://github.com/facebookincubator/ttpforge).
Swift
80
star
45

antlir

ANoTher Linux Image buildeR
Rust
76
star
46

sks

Secure Key Storage (SKS) is a library for Go that abstracts Security Hardware on laptops.
Go
72
star
47

dcrpm

A tool to detect and correct common issues around RPM database corruption.
Python
72
star
48

ConversionsAPI-Tag-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Please follow the instructions https://www.facebook.com/business/help/702509907046774
Smarty
64
star
49

InjKit

Injection Kit. It is a java bytecode processing library for bytecode injection and transformation.
Java
55
star
50

obs-plugins

OBS Plugins
C++
54
star
51

glTFVariantMeld

An application that accepts files on the glTF format, interprets them as variants of an over-arching whole, and melds them together.
Rust
51
star
52

haberdashery

A collection of high-performance crypto implementations.
Rust
44
star
53

later

A framework for python asyncio with batteries included for people writing services in python asyncio
Python
39
star
54

go2chef

A Golang tool to bootstrap a system from zero so that it's able to run Chef to be managed
Go
39
star
55

CommutingZones

Commuting zones are geographic areas where people live and work and are useful for understanding local economies, as well as how they differ from traditional boundaries. These zones are a set of boundary shapes built using aggregated estimates of home and work locations. Data used to build commuting zones is aggregated and de-identified.
JavaScript
39
star
56

ConversionsAPI-Client-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Primarily we will be hosting, - ConversionsAPI(Facebook) Client - listens on the events fired to GTM Server and maps them to common GTM schema. - ConversionsAPI(Facebook) Tag - server tag that fires events to CAPI.For more details on Design here https//fburl.com/uae68vlr
37
star
57

Facebook-Pixel-for-Wordpress

A plugin for advertisers who use Wordpress to enable them easily setup the facebook pixel.
JavaScript
37
star
58

strobelight

Meta's fleetwide profiler framework
C++
28
star
59

buck2-change-detector

Given a Buck2 built project and a set of changes (e.g. from source control) compute the targets that may have changed. Sometimes known as a target determinator, useful for optimizing a CI system.
Rust
28
star
60

wordpress-messenger-customer-chat-plugin

Messenger Customer Chat Plugin for WordPress
PHP
27
star
61

CP4M

CP4M is a conversational marketing platform which enables advertisers to integrate their customer-facing chatbots with FB Messenger/WhatsApp, in order to meet customers where they are and drive native conversations on the advertiser's owned infra.
Java
27
star
62

rush

RUSH (Reliable - unreliable - Streaming Protocol)
C++
26
star
63

MY_ENUM

Small c++ macro library to add compile-time introspection to c++ enum classes.
C++
17
star
64

SafeC

Library containing safer alternatives/wrappers for insecure C APIs.
C++
16
star
65

go-belt

It is an implementation-agnostic Go(lang) package to generalize observability tooling (logger, metrics, tracer and so on) and provide ability to use any of these tools with a standard context. Essentially it is an attempt to standardize observability API in Go.
Go
16
star
66

spark-ar-core-libs

Core libraries that can be used in Spark AR. You can import each library depends on your requirements.
TypeScript
15
star
67

scrut

Scrut is a testing toolkit for CLI applications. A tool to scrutinize terminal programs without fuss.
Rust
15
star
68

sado

A macOS signed-app shim for running daemons with reliable capabilities.
Swift
12
star
69

Portal-Kernel

Kernel Code for Portal.
C
11
star
70

npe-toolkit

Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.
TypeScript
10
star
71

Eigen-FBPlugins

This is collection of plugins extending Eigen arrays/matrices with main focus on using them for computer vision. In particular, this project should provide support for multichannel arrays (missing in vanilla Eigen) and seamless integration between Eigen types and OpenCV functions.
C++
9
star
72

isometric_pattern_matcher

A new isometric calibration pattern - which should/might lead to higher accuracy calibrations compared to existing solutions (checkerboards, patterns of circles).
C++
8
star
73

dnf-plugin-cow

Code to enable Copy on Write features being upstreamed in rpm and librepo
Shell
8
star
74

jupyterhub_fb_authenticator

JupyterHub Facebook Authenticator is a Facebook OAuth authenticator built on top of OAuthenticator.
Python
8
star
75

wireguard_py

Cython library for Wireguard
C
7
star
76

kernel-patches-daemon

Sync Patchwork series's with Github pull requests
Python
6
star
77

meta-fbvuln

OpenEmbedded meta-layer that allows producing a vulnerability manifest alongside a Yocto build. The produced manifest is suitable for ongoing vulnerability scanning of fielded software.
5
star
78

gazebo_lint

A Rust linter that provides various suggestions based on the new primitives offered in the `gazebo` library.
Rust
4
star
79

language-capirca

Adds syntax highlighting for Capirca filetypes in Atom. Capirca is an open source standard for writing vendor-neutral firewall policies as originally released by Google: https://github.com/google/capirca
3
star
80

cinderx

cinderx
C++
3
star
81

fbc_owrt_feed

Facebook Connectivity OpenWrt Feed. Package feed for OpenWrt router OS by Facebook Connectivity programme.
Lua
2
star
82

cutlass-fork

A Meta fork of NV CUTLASS repo.
C++
2
star
83

hwbits_lib

Abstraction of hardware register-level protocols in a python semantic names.
Python
1
star