• Stars
    star
    251
  • Rank 155,850 (Top 4 %)
  • Language Starlark
  • License
    MIT License
  • Created over 1 year ago
  • Updated 7 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bazel C/C++ toolchain for cross-compiling C/C++ programs

Build status

Hermetic CC toolchain

This is a C/C++ toolchain that can (cross-)compile C/C++ programs on top of zig cc. It contains clang-16, musl, glibc 2-2.34, all in a ~40MB package. Read here about zig-cc; the rest of the README will present how to use this toolchain from Bazel.

Configuring toolchains in Bazel is complex, under-documented, and fraught with peril. We, the team behind hermetic_cc_toolchain,are still confused on how this all works, and often wonder why it works at all. That aside, we made our best effort to make hermetic_cc_toolchain usable for your C/C++/CGo projects, with as many guardrails as we could install.

While copy-pasting the code in your project, attempt to read and understand the text surrounding the code snippets. This will save you hours of head scratching.

Project Origin

This repository is cloned from and is based on Adam Bouhenguel's bazel-zig-cc, and was later developed at sr.ht/~motiejus/bazel-zig-cc. After a while this repository was moved to the Uber GitHub repository and renamed to hermetic_cc_toolchain.

Our special thanks to Adam for coming up with the idea - and creating the original version – of bazel-zig-cc and publishing it. His idea and work helped make the concept of using Zig with Bazel a reality; now we all can benefit from it.

Usage

Add this to your WORKSPACE:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

HERMETIC_CC_TOOLCHAIN_VERSION = "v2.0.0"

http_archive(
    name = "hermetic_cc_toolchain",
    sha256 = "57f03a6c29793e8add7bd64186fc8066d23b5ffd06fe9cc6b0b8c499914d3a65",
    urls = [
        "https://mirror.bazel.build/github.com/uber/hermetic_cc_toolchain/releases/download/{0}/hermetic_cc_toolchain-{0}.tar.gz".format(HERMETIC_CC_TOOLCHAIN_VERSION),
        "https://github.com/uber/hermetic_cc_toolchain/releases/download/{0}/hermetic_cc_toolchain-{0}.tar.gz".format(HERMETIC_CC_TOOLCHAIN_VERSION),
    ],
)

load("@hermetic_cc_toolchain//toolchain:defs.bzl", zig_toolchains = "toolchains")

# Plain zig_toolchains() will pick reasonable defaults. See
# toolchain/defs.bzl:toolchains on how to change the Zig SDK version and
# download URL.
zig_toolchains()

And this to .bazelrc:

build --incompatible_enable_cc_toolchain_resolution

The snippets above will download the zig toolchain and make the bazel toolchains available for registration and usage. If you do nothing else, this may work. The .bazelrc snippet instructs Bazel to use the registered "new kinds of toolchains". All above are required regardless of how wants to use it. The next steps depend on how one wants to use hermetic_cc_toolchain. The descriptions below is a gentle introduction to C++ toolchains from "user's perspective" too.

Use case: manually build a single target with a specific zig cc toolchain

This option is least disruptive to the workflow compared to no hermetic C++ toolchain, and works best when trying out or getting started with hermetic_cc_toolchain for a subset of targets.

To request Bazel to use a specific toolchain (compatible with the specified platform) for build/tests/whatever on linux-amd64-musl, do:

bazel build \
    --platforms @zig_sdk//platform:linux_arm64 \
    --extra_toolchains @zig_sdk//toolchain:linux_arm64_musl \
    //test/go:go

There are a few things going on here, let's try to dissect them.

Option --platforms @zig_sdk//platform:linux_arm64

Specifies that the our target platform is linux_arm64, which resolves into:

$ bazel query --output=build @zig_sdk//platform:linux_arm64
platform(
  name = "linux_arm64",
  generator_name = "linux_arm64",
  generator_function = "declare_platforms",
  generator_location = "platform/BUILD:7:18",
  constraint_values = ["@platforms//os:linux", "@platforms//cpu:aarch64"],
)

constraint_values instructs Bazel to be looking for a toolchain that is compatible with (in Bazelspeak, target_compatible_with) all of the ["@platforms//os:linux", "@platforms//cpu:aarch64"].

Option --toolchains=@zig_sdk//toolchain:linux_arm64_musl

Inspect first (@platforms//cpu:aarch64 is an alias to @platforms//cpu:arm64):

$ bazel query --output=build @zig_sdk//toolchain:linux_arm64_musl
toolchain(
  name = "linux_arm64_musl",
  generator_name = "linux_arm64_musl",
  generator_function = "declare_toolchains",
  generator_location = "toolchain/BUILD:7:19",
  toolchain_type = "@bazel_tools//tools/cpp:toolchain_type",
  target_compatible_with = ["@platforms//os:linux", "@platforms//cpu:aarch64", "@zig_sdk//libc:unconstrained"],
  toolchain = "@zig_sdk//:aarch64-linux-musl_cc",
)

For a platform to pick up the right toolchain, the platform's constraint_values must be a subset1 of the toolchain's target_compatible_with. Since the platform is a subset (therefore, toolchain's @zig_sdk//libc:unconstrained does not matter), this toolchain is selected for this platform. As a result, --platforms @zig_sdk//platform:linux_amd64 causes Bazel to select a toolchain @zig_sdk//platform:linux_arm64_musl (because it satisfies all constraints), which will compile and link the C/C++ code with musl.

@zig_sdk//libc:unconstrained will become important later.

Same as above, less typing (with --config)

Specifying the platform and toolchain for every target may become burdensome, so they can be put used via --config. For example, append this to .bazelrc:

build:linux_arm64 --platforms @zig_sdk//platform:linux_arm64
build:linux_arm64 --extra_toolchains @zig_sdk//toolchain:linux_arm64_musl

And then building to linux-arm64-musl boils down to:

bazel build --config=linux_arm64_musl //test/go:go

Use case: always compile with zig cc

Instead of adding the toolchains to .bazelrc, they can be added unconditionally. Append this to WORKSPACE after zig_toolchains(...):

register_toolchains(
    "@zig_sdk//toolchain:linux_amd64_gnu.2.28",
    "@zig_sdk//toolchain:linux_arm64_gnu.2.28",
    "@zig_sdk//toolchain:darwin_amd64",
    "@zig_sdk//toolchain:darwin_arm64",
    "@zig_sdk//toolchain:windows_amd64",
    "@zig_sdk//toolchain:windows_arm64",
)

Append this to .bazelrc:

build --action_env BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1

From Bazel's perspective, this is almost equivalent to always specifying --extra_toolchains on every bazel <...> command-line invocation. It also means there is no way to disable the toolchain with the command line. This is useful if you find hermetic_cc_toolchain useful enough to compile for all of your targets and tools.

With BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 Bazel stops detecting the default host toolchain. Configuring toolchains is complicated enough, and the auto-detection (read: fallback to non-hermetic toolchain) is a footgun best avoided. This option is not documented in bazel, so may break. If you intend to use the hermetic toolchain exclusively, it won't hurt.

Use case: zig-cc for targets for multiple libc variants

When some targets need to be build with different libcs (either different versions of glibc or musl), use a linux toolchain from @zig_sdk//libc_aware/toolchains:<...>. The toolchain will only be selected when building for a specific libc. For example, in WORKSPACE:

register_toolchains(
    "@zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19",
    "@zig_sdk//libc_aware/toolchain:linux_arm64_gnu.2.28",
    "@zig_sdk//libc_aware/toolchain:x86_64-linux-musl",
)

What does @zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19 mean?

$ bazel query --output=build @zig_sdk//libc_aware/toolchain:linux_amd64_gnu.2.19 |& grep target
  target_compatible_with = ["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.19"],

To see how this relates to the platform:

$ bazel query --output=build @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.19 |& grep constraint
  constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64", "@zig_sdk//libc:gnu.2.19"],

In this case, the platform's constraint_values and toolchain's target_compatible_with are identical, causing Bazel to select the right toolchain for the requested platform. With these toolchains registered, one can build a project for a specific libc-aware platform; it will select the appropriate toolchain:

$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.19 //test/c:which_libc
glibc_2.19
$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_gnu.2.28 //test/c:which_libc
glibc_2.28
$ bazel run --platforms @zig_sdk//libc_aware/platform:linux_amd64_musl //test/c:which_libc
non_glibc
$ bazel run --run_under=file --platforms @zig_sdk//libc_aware/platform:linux_arm64_gnu.2.28 //test/c:which_libc
which_libc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 2.0.0, stripped

To the list of libc aware toolchains and platforms:

$ bazel query @zig_sdk//libc_aware/toolchain/...
$ bazel query @zig_sdk//libc_aware/platform/...

Libc-aware toolchains are especially useful when relying on transitions, as transitioning extra_platforms will cause the host tools to be rebuilt with the specific libc version, which takes time; also the build host may not be able to run them if, say, target glibc version is newer than on the host. Some tests in this repository (under test/) are using transitions; you may check out how it's done.

The @zig_sdk//libc:variant constraint is necessary to select a matching toolchain. Remember: the toolchain's target_compatible_with must be equivalent or a superset of the platform's constraint_values. This is why both libc-aware platforms and libc-aware toolchains reside in their own namespace; if we try to mix non-libc-aware to libc-aware, confusion ensues.

To use the libc constraints in the project's platform definitions, add a @zig_sdk//libc:variant constraint to them. See the list of available values:

$ bazel query "attr(constraint_setting, @zig_sdk//libc:variant, @zig_sdk//...)"

@zig_sdk//libc:unconstrained is a special value that indicates that no value for the constraint is specified. The non libc aware linux toolchains are only compatible with this value to prevent accidental silent fallthrough to them. This is a guardrail.

Note: Naming

Both Go and Bazel naming schemes are accepted. For convenience with Go, the following Go-style toolchain aliases are created:

Bazel (zig) name Go name
x86_64 amd64
aarch64 arm64
macos darwin

For example, the toolchain linux_amd64_gnu.2.28 is aliased to x86_64-linux-gnu.2.28. To find out which toolchains can be registered or used, run:

$ bazel query @zig_sdk//toolchain/...

Incompatibilities with clang and gcc

zig cc is almost a drop-in replacement for clang/gcc. This section lists some of the discovered differences and ways to live with them.

UBSAN and "SIGILL: Illegal Instruction"

zig cc differs from "mainstream" compilers by enabling UBSAN by default. Which means your program may compile successfully and crash with:

SIGILL: illegal instruction

This flag encourages program authors to fix the undefined behavior. There are many ways to find the undefined behavior.

Known Issues In hermetic_cc_toolchain

These are the things you may stumble into when using hermetic_cc_toolchain. We are unlikely to implement them any time soon, but patches implementing those will be accepted.

Zig cache location

Currently zig cache is stored in /tmp/hermetic_cc_toolchain, so bazel clean --expunge will not clear the zig cache. Zig's cache should be stored somewhere in the project's path. It is not clear how to do it.

OSX: sysroot

For non-trivial programs (and for all darwin/arm64 cgo programs) MacOS SDK may be necessary. Read Jakub's comment about it. Support for OSX sysroot is currently not implemented, but patches implementing it will be accepted, as long as the OSX sysroot must come through an http_archive.

In essence, OSX target support is not well tested with hermetic_cc_toolchain.

Known Issues In Upstream

This section lists issues that we have stumbled into when using zig cc, and is outside of hermetic_cc_toolchain's control.

Number of libc stubs with Go 1.20+

Until Go 1.19 the number of glibc stubs that needed to be compiled was strictly controlled. Go 1.20 no longer ships with pre-compiled archive files for the standard library, and it generates them on the fly, causing many extraneous libc stubs. Therefore, the initial compilation will take longer until those stubs are pre-cached.

Host Environments

This repository is used on the following (host) platforms:

  • linux_amd64, a.k.a. x86_64.
  • linux_arm64, a.k.a. AArch64.
  • darwin_amd64, the 64-bit post-PowerPC models.
  • darwin_arm64, the M1.
  • windows_amd64, a.k.a. x64.

The tests are running (CId) on linux-amd64.

Transient docker environment

A standalone Docker environment to play with hermetic_cc_toolchain:

$ docker run -e CC=/usr/bin/false -ti --rm -v "$PWD:/x" -w /x debian:bookworm-slim
# apt update && apt install --no-install-recommends -y shellcheck ca-certificates python3 git
# git config --global --add safe.directory /x
# ./ci/lint
# ./ci/release
# ./ci/test
# ./ci/zig-wrapper

Communication

We maintain two channels for comms:

  • Github issues and pull requests.
  • Slack: #zig in bazel.slack.com.

Previous Commuications

Previous communications were done in a mailing list; the past archive can be accessed like this:

git checkout v2.0.0-rc2 mailing-list-archive.mbox
mutt -R -f mailing-list-archive.mbox

Maintainers

Guidelines for maintainers2:

  • Communicate intent precisely.
  • Edge cases matter.
  • Favor reading code over writing code.
  • Only one obvious way to do things.
  • Runtime crashes are better than bugs.
  • Compile errors are better than runtime crashes.
  • Incremental improvements.
  • Avoid local maximums.
  • Reduce the amount one must remember.
  • Focus on code rather than style.
  • Resource allocation may fail; resource deallocation must succeed.
  • Memory is a resource.
  • Together we serve the users.

On a more practical note:

  • Maintainers can merge others' pull requests following their best judgement. They may or may not ask for feedback from other maintainers. Follow the Zen of Zig.
  • Releases are cut by Uber employees, because they can test the version-to-be-released with our Go Monorepo. If you use hermetic_cc_toolchain in any serious capacity, we encourage you to make yourself known, so we can work together to validate it before cutting the release.

Footnotes

  1. a mathematical subset: both can be equal. ↩

  2. Credit: zig zen ↩

More Repositories

1

react-vis

Data Visualization Components
JavaScript
8,657
star
2

baseweb

A React Component library implementing the Base design language
TypeScript
8,622
star
3

cadence

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
Go
7,808
star
4

RIBs

Uber's cross-platform mobile architecture framework.
Kotlin
7,672
star
5

kraken

P2P Docker registry capable of distributing TBs of data in seconds
Go
5,848
star
6

prototool

Your Swiss Army Knife for Protocol Buffers
Go
5,051
star
7

causalml

Uplift modeling and causal inference with machine learning algorithms
Python
4,759
star
8

h3

Hexagonal hierarchical geospatial indexing system
C
4,591
star
9

NullAway

A tool to help eliminate NullPointerExceptions (NPEs) in your Java code with low build-time overhead
Java
3,525
star
10

AutoDispose

Automatic binding+disposal of RxJava streams.
Java
3,358
star
11

aresdb

A GPU-powered real-time analytics storage and query engine.
Go
2,983
star
12

react-digraph

A library for creating directed graph editors
JavaScript
2,583
star
13

piranha

A tool for refactoring code related to feature flag APIs
Java
2,223
star
14

orbit

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
Python
1,803
star
15

ios-snapshot-test-case

Snapshot view unit tests for iOS
Objective-C
1,770
star
16

petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Python
1,751
star
17

needle

Compile-time safe Swift dependency injection framework
Swift
1,749
star
18

manifold

A model-agnostic visual debugging tool for machine learning
JavaScript
1,636
star
19

okbuck

OkBuck is a gradle plugin that lets developers utilize the Buck build system on a gradle project.
Java
1,536
star
20

UberSignature

Provides an iOS view controller allowing a user to draw their signature with their finger in a realistic style.
Objective-C
1,283
star
21

nanoscope

An extremely accurate Android method tracing tool.
HTML
1,240
star
22

tchannel

network multiplexing and framing protocol for RPC
Thrift
1,150
star
23

queryparser

Parsing and analysis of Vertica, Hive, and Presto SQL.
Haskell
1,069
star
24

fiber

Distributed Computing for AI Made Simple
Python
1,037
star
25

neuropod

A uniform interface to run deep learning models from multiple frameworks
C++
929
star
26

uReplicator

Improvement of Apache Kafka Mirrormaker
Java
898
star
27

pam-ussh

uber's ssh certificate pam module
Go
832
star
28

ringpop-go

Scalable, fault-tolerant application-layer sharding for Go applications
Go
815
star
29

h3-js

h3-js provides a JavaScript version of H3, a hexagon-based geospatial indexing system.
JavaScript
801
star
30

mockolo

Efficient Mock Generator for Swift
Swift
776
star
31

xviz

A protocol for real-time transfer and visualization of autonomy data
JavaScript
760
star
32

h3-py

Python bindings for H3, a hierarchical hexagonal geospatial indexing system
Python
755
star
33

streetscape.gl

Visualization framework for autonomy and robotics data encoded in XVIZ
JavaScript
702
star
34

react-view

React View is an interactive playground, documentation and code generator for your components.
TypeScript
688
star
35

nebula.gl

A suite of 3D-enabled data editing overlays, suitable for deck.gl
TypeScript
665
star
36

RxDogTag

Automatic tagging of RxJava 2+ originating subscribe points for onError() investigation.
Java
645
star
37

peloton

Unified Resource Scheduler to co-schedule mixed types of workloads such as batch, stateless and stateful jobs in a single cluster for better resource utilization.
Go
636
star
38

motif

A simple DI API for Android / Java
Kotlin
530
star
39

signals-ios

Typeful eventing
Objective-C
526
star
40

tchannel-go

Go implementation of a multiplexing and framing protocol for RPC calls
Go
480
star
41

grafana-dash-gen

grafana dash dash dash gen
JavaScript
476
star
42

marmaray

Generic Data Ingestion & Dispersal Library for Hadoop
Java
473
star
43

zanzibar

A build system & configuration system to generate versioned API gateways.
Go
451
star
44

clay

Clay is a framework for building RESTful backend services using best practices. It’s a wrapper around Flask.
Python
441
star
45

astro

Astro is a tool for managing multiple Terraform executions as a single command
Go
430
star
46

NEAL

πŸ”ŽπŸž A language-agnostic linting platform
OCaml
424
star
47

react-vis-force

d3-force graphs as React Components.
JavaScript
401
star
48

arachne

An always-on framework that performs end-to-end functional network testing for reachability, latency, and packet loss
Go
387
star
49

cadence-web

Web UI for visualizing workflows on Cadence
JavaScript
377
star
50

Python-Sample-Application

Python
374
star
51

rides-ios-sdk

Uber Rides iOS SDK (beta)
Swift
367
star
52

stylist

A stylist creates cool styles. Stylist is a Gradle plugin that codegens a base set of Android XML themes.
Kotlin
355
star
53

storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Go
334
star
54

swift-concurrency

Concurrency utilities for Swift
Swift
323
star
55

RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Java
317
star
56

cyborg

Display Android Vectordrawables on iOS.
Swift
301
star
57

rides-android-sdk

Uber Rides Android SDK (beta)
Java
288
star
58

h3-go

Go bindings for H3, a hierarchical hexagonal geospatial indexing system
Go
282
star
59

h3-java

Java bindings for H3, a hierarchical hexagonal geospatial indexing system
Java
260
star
60

h3-py-notebooks

Jupyter notebooks for h3-py, a hierarchical hexagonal geospatial indexing system
Jupyter Notebook
244
star
61

geojson2h3

Conversion utilities between H3 indexes and GeoJSON
JavaScript
216
star
62

artist

An artist creates views. Artist is a Gradle plugin that codegens a base set of Android Views.
Kotlin
210
star
63

tchannel-node

JavaScript
205
star
64

RxCentralBle

A reactive, interface-driven central role Bluetooth LE library for Android
Java
198
star
65

uberalls

Track code coverage metrics with Jenkins and Phabricator
Go
187
star
66

SwiftCodeSan

SwiftCodeSan is a tool that "sanitizes" code written in Swift.
Swift
172
star
67

rides-python-sdk

Uber Rides Python SDK (beta)
Python
170
star
68

doubles

Test doubles for Python.
Python
165
star
69

logtron

A logging MACHINE
JavaScript
158
star
70

cadence-java-client

Java framework for Cadence Workflow Service
Java
139
star
71

athenadriver

A fully-featured AWS Athena database driver (+ athenareader https://github.com/uber/athenadriver/tree/master/athenareader)
Go
138
star
72

cassette

Store and replay HTTP requests made in your Python app
Python
138
star
73

UBTokenBar

Flexible and extensible UICollectionView based TokenBar written in Swift
Swift
136
star
74

tchannel-java

A Java implementation of the TChannel protocol.
Java
133
star
75

bayesmark

Benchmark framework to easily compare Bayesian optimization methods on real machine learning tasks
Python
128
star
76

android-template

This template provides a starting point for open source Android projects at Uber.
Java
127
star
77

crumb

An annotation processor for breadcrumbing metadata across compilation boundaries.
Kotlin
122
star
78

py-find-injection

Look for SQL injection attacks in python source code
Python
119
star
79

rides-java-sdk

Uber Rides Java SDK (beta)
Java
102
star
80

startup-reason-reporter

Reports the reason why an iOS App started.
Objective-C
96
star
81

uber-poet

A mock swift project generator & build runner to help benchmark various module dependency graphs.
Python
95
star
82

cadence-java-samples

Java
94
star
83

charlatan

A Python library to efficiently manage and install database fixtures
Python
89
star
84

swift-abstract-class

Compile-time abstract class validation for Swift
Swift
83
star
85

simple-store

Simple yet performant asynchronous file storage for Android
Java
81
star
86

tchannel-python

Python implementation of the TChannel protocol.
Python
77
star
87

client-platform-engineering

A collection of cookbooks, scripts and binaries used to manage our macOS, Ubuntu and Windows endpoints
Ruby
72
star
88

eight-track

Record and playback HTTP requests
JavaScript
70
star
89

multidimensional_urlencode

Python library to urlencode a multidimensional dict
Python
67
star
90

lint-checks

A set of opinionated and useful lint checks
Kotlin
67
star
91

uncaught-exception

Handle uncaught exceptions.
JavaScript
66
star
92

swift-common

Common code used by various Uber open source projects
Swift
65
star
93

uberscriptquery

UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
Java
58
star
94

sentry-logger

A Sentry transport for Winston
JavaScript
55
star
95

graph.gl

WebGL2-Powered Visualization Components for Graph Visualization
JavaScript
51
star
96

nanoscope-art

C++
48
star
97

assume-role-cli

CLI for AssumeRole is a tool for running programs with temporary credentials from AWS's AssumeRole API.
Go
47
star
98

airlock

A prober to probe HTTP based backends for health
JavaScript
47
star
99

mutornadomon

Easy-to-install monitor endpoint for Tornado applications
Python
46
star
100

kafka-logger

A kafka logger for winston
JavaScript
45
star