• Stars
    star
    508
  • Rank 86,941 (Top 2 %)
  • Language
    Rust
  • License
    MIT License
  • Created over 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cargo subcommand for optimizing Rust binaries/libraries with PGO and BOLT.

cargo-pgo Build Status Latest Version

Cargo subcommand that makes it easier to use PGO and BOLT to optimize Rust binaries.

Installation

$ cargo install cargo-pgo

You will also need the llvm-profdata binary for PGO and llvm-bolt and merge-fdata binaries for BOLT.

You can install the PGO helper binary by adding the llvm-tools-preview component to your toolchain with rustup:

$ rustup component add llvm-tools-preview

For BOLT, it's unfortunately more complicated. See below for BOLT installation guide.

BOLT support is currently experimental.

PGO/BOLT workflow

It is important to understand the workflow of using feedback-directed optimizations. Put simply, it consists of three general steps:

  1. Build binary with instrumentation
    • Perform a special build of your executable which will add additional instrumentation code to it.
  2. Gather performance profiles
    • Run your instrumented binary on representative workloads. The binary will generate profile files on disk which will be then used to optimize the binary.
    • Try to gather as much data as possible. Ideally, run your binary for at least a minute or more.
  3. Build an optimized binary using generated profiles
    • The compiler will use the generated profiles to build an optimized version of your binary.
    • The binary will be optimized with respect to the profiled workloads. If you execute it on a substantially different workload, the optimizations might not work (or they might even make your binary slower!).

Example

Example usage of the tool

Usage

Before you start to optimize your binaries, you should first check if your environment is set up correctly, at least for PGO (BOLT is more complicated). You can do that using the info command:

$ cargo pgo info

PGO

cargo-pgo provides subcommands that wrap common Cargo commands. It will automatically add --release to wrapped commands where it is applicable, since it doesn't really make sense to perform PGO on debug builds.

Generating the profiles

First, you need to generate the PGO profiles by performing an instrumented build. You can currently do that in several ways. The most generic command for creating an instrumented artifact is cargo pgo instrument:

$ cargo pgo instrument [<command>] -- [cargo-args]

The command specifies what command will be executed by cargo. It is optional and by default it is set to build. You can pass additional arguments for cargo after --.

There are several ways of producing the profiles:

  • Building a binary

    $ cargo pgo build
    # or
    $ cargo pgo instrument build

    This is the simplest and recommended approach. You build an instrumented binary and then run it on some workloads. Note that the binary will be located at <target-dir>/<target-triple>/release/<binary-name>.

  • Running an instrumented program

    $ cargo pgo run
    # or
    $ cargo pgo instrument run

    You can also directly execute an instrumented binary with the cargo pgo run command, which is a shortcut for cargo pgo instrument run. This command will instrument the binary and then execute it right away.

  • Run instrumented tests

    $ cargo pgo test
    # or
    $ cargo pgo instrument test

    This command will generate profiles by executing tests. Note that unless your test suite is really comprehensive, it might be better to create a binary and run it on some specific workloads instead.

  • Run instrumented benchmarks

    $ cargo pgo bench
    # or
    $ cargo pgo instrument bench

    This command will generate profiles by executing benchmarks.

Building an optimized binary

Once you have generated some profiles, you can execute cargo pgo optimize to build an optimized version of your binary.

If you want, you can also pass a command to cargo pgo optimize to e.g. run PGO-optimized benchmarks or tests:

$ cargo pgo optimize bench
$ cargo pgo optimize test

Analyzing PGO profiles

You can analyze gathered PGO profiles using the llvm-profdata binary:

$ llvm-profdata show <profile>.profdata

BOLT

Using BOLT with cargo-pgo is similar to using PGO, however you have to build BOLT manually and support for it is currently in an experimental stage.

BOLT is not supported directly by rustc, so the instrumentation and optimization commands are not directly applied to binaries built by rustc. Instead, cargo-pgo creates additional binaries that you have to use for gathering profiles and executing the optimized code.

Generating the profiles

First, you need to generate the BOLT profiles. To do that, execute the following command:

$ cargo pgo bolt build

The instrumented binary will be located at <target-dir>/<target-triple>/release/<binary-name>-bolt-instrumented. Execute it on several workloads to gather as much data as possible.

Note that for BOLT, the profile gathering step is optional. You can also simply run the optimization step (see below) without any profiles, although it will probably not have a large effect.

Building an optimized binary

Once you have generated some profiles, you can execute cargo pgo bolt optimize to build an optimized version of your binary. The optimized binary will be named <binary-name>-bolt-optimized.

BOLT + PGO

Yes, BOLT and PGO can even be combined :) To do that, you should first generate PGO profiles and then use BOLT on already PGO optimized binaries. You can do that using the --with-pgo flag:

# Build PGO instrumented binary
$ cargo pgo build
# Run binary to gather PGO profiles
$ ./target/.../<binary>
# Build BOLT instrumented binary using PGO profiles
$ cargo pgo bolt build --with-pgo
# Run binary to gather BOLT profiles
$ ./target/.../<binary>-bolt-instrumented
# Optimize a PGO-optimized binary with BOLT
$ cargo pgo bolt optimize --with-pgo

Do not strip symbols from your release binary when using BOLT! If you do it, you might encounter linker errors.

BOLT installation

Here's a short guide how to compile LLVM with BOLT manually. You will need a recent compiler, CMake and ninja.

Note: LLVM BOLT is slowly getting into package repositories, although it's not fully working out of the box yet. You can find more details here if you're interested.

  1. Download LLVM
    $ git clone https://github.com/llvm/llvm-project
    $ cd llvm-project 
  2. (Optional) Checkout a stable version, at least 14.0.0
    $ git checkout llvmorg-14.0.5
    Note that BOLT is being actively fixed, so a trunk version of LLVM might actually work better.
  3. Prepare the build
    $ cmake -S llvm -B build -G Ninja \
      -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_INSTALL_PREFIX=${PWD}/llvm-install \
      -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt;bolt"
  4. Compile LLVM with BOLT
    $ cd build
    $ ninja
    $ ninja install 
    The built files should be located at <llvm-dir>/llvm-install/bin. You should add this directory to $PATH to make BOLT usable with cargo-pgo.

Related work

  • cargo-pgo I basically independently reimplemented this crate. It uses an almost identical approach, but doesn't support BOLT. It's not maintained anymore, I got a permission from its author to (re)use its name.

License

MIT

More Repositories

1

hardware-effects

Demonstration of various hardware effects.
C++
2,715
star
2

cargo-wizard

Cargo subcommand for configuring Cargo projects for best performance.
Rust
710
star
3

rust-delegate

Rust method delegation with less boilerplate
Rust
388
star
4

hardware-effects-gpu

Demonstration of various hardware effects on CUDA GPUs.
C++
304
star
5

cargo-remark

Cargo subcommand for viewing LLVM optimization remarks.
Rust
162
star
6

davis

Assembly debugger written in Angular 2.
TypeScript
56
star
7

rust-course-fei

Rust course taught at FEI VŠB-TUO.
Rust
13
star
8

sigmod-2018

Code for the SIGMOD 2018 programming contest. Finished at 2nd place.
C++
12
star
9

debug-visualizer

Program memory visualizer for GDB/LLDB (bachelor thesis)
Python
10
star
10

sigmod-2019

Code for the SIGMOD 2019 programming contest. Finished at 2nd place.
C++
8
star
11

llvm-instrument

LLVM instrumentation
C++
6
star
12

rustlang.cz

Web that gathers information about the Rust community in the Czech Republic.
HTML
6
star
13

advent-of-code

Advent of code solutions
Python
4
star
14

cuda-profile

Instrumentation based profiler for CUDA (master thesis)
C++
3
star
15

sigmod-2016

Code for the SIGMOD 2016 programming contest. Finished at 14th place.
C++
3
star
16

talks

Source code and slides for my public talks.
Python
3
star
17

cfggen

Python configuration generator
Python
3
star
18

llvm-se

Static analysis using symbolic execution on top of LLVM IR
C++
2
star
19

handmade-quake

Quake recreated by following the tutorial from Philip Buuck (https://www.youtube.com/channel/UCXgjH2-Mrb3-h1_iWurz7dQ).
C
2
star
20

kobzol

2
star
21

async-iterator-examples

Examples of Rust async iterators
Rust
2
star
22

kobzol.github.io

Blog about programming stuff.
HTML
2
star
23

rust-web-app-demo

Demo of a small newsletter web app in Rust.
Rust
2
star
24

Spaceships

Android (Java) 2D game project made as a school assignment.
Java
1
star
25

Ghrab-Robot

Projekt robotického kroužku Gymnázia Ostrava-Hrabůvka.
C
1
star
26

cuda-graph

BFS implemented in CUDA.
C++
1
star
27

agu

Algorithmisation of Geometrical Problems VSB-TUO course
C++
1
star
28

ZPG-project

Project for ZPG (Principles of Computer Graphics).
C
1
star
29

turret

School project, (somehow modified) clone of Tower defense.
Java
1
star
30

Computer-Graphics-I

Code for subject Computer Graphics I at VSB-TUO.
C++
1
star
31

sigmod-2017

Code for the SIGMOD 2017 programming contest. Finished at 15th place.
C++
1
star
32

valgrind-se

Symbolic execution in Valgrind. Based on https://github.com/spirali/aislinn.
C
1
star
33

elsie-gallery

Python
1
star
34

arduino-tetris

Classic tetris game displayed on 8x8 LED Matrix (MAX72xx) on Arduino
C++
1
star
35

mkdocs-nedoc-plugin

Mkdocs plugin for the nedoc Python API documentation generator.
Python
1
star
36

rust-cmd-spawn-bench

Benchmark for process spawning in Rust, on Linux.
Python
1
star
37

pyladies-extended

Jupyter Notebook
1
star