• Stars
    star
    4,024
  • Rank 10,821 (Top 0.3 %)
  • Language
    C
  • License
    Other
  • Created almost 11 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Coz: Causal Profiling

Coz: Finding Code that Counts with Causal Profiling

by Charlie Curtsinger and Emery Berger.

Coz is a new kind of profiler that unlocks optimization opportunities missed by traditional profilers. Coz employs a novel technique we call causal profiling that measures optimization potential. This measurement matches developers' assumptions about profilers: that optimizing highly-ranked code will have the greatest impact on performance. Causal profiling measures optimization potential for serial, parallel, and asynchronous programs without instrumentation of special handling for library calls and concurrency primitives. Instead, a causal profiler uses performance experiments to predict the effect of optimizations. This allows the profiler to establish causality: "optimizing function X will have effect Y," exactly the measurement developers had assumed they were getting all along.

Example Coz profile

The above profile generated by Coz shows the "bang for buck" of optimizing a line of code in the Ferret application. Almost every effort to optimize the performance of this line of code directly leads to an increase in overall performance, making it an excellent candidate for optimization efforts.

Full details of Coz are available in our paper, Coz: Finding Code that Counts with Causal Profiling (pdf), SOSP 2015, October 2015 (recipient of a Best Paper Award).

Coz presentation at SOSP

Installation

On Debian, Ubuntu, and Fedora, you can install Coz via apt:

sudo apt install coz-profiler

An OpenSUSE package was prepared by user @zethra and is available at https://build.opensuse.org/package/show/home:zethra/coz-profiler.

Coz should work on any modern Linux system (specifically, running version 2.6.32 or later, with support for the perf_event_open system call) with a Python 3.x interpreter.

Libraries/Wrappers

By default, Coz works for C, C++, and Rust programs. It has been ported or has wrappers for several other languages, listed below:

Language Link
Java JCoz: https://github.com/Decave/JCoz
Go Cozgo: https://github.com/urjitbhatia/cozgo
Swift Swift Coz: https://github.com/funcmike/swift-coz

Building Coz From Source

To build Coz from source, you will need:

  • A copy of the source code for this project
  • A compiler with C++0x support (clang++ or g++)
  • A Python interpreter (Python 3.x is required)
  • OPTIONAL: for building the profiler viewer, you need NodeJS and npm -- sudo apt-get install nodejs npm

Once you have all dependencies in place, build Coz with CMake. On Debian-based distributions, the following commands should take care of the entire process:

sudo apt-get install build-essential cmake docutils-common git python3 pkg-config
git clone https://github.com/antoyo/libelfin && cd libelfin && make && sudo make install && cd ..
git clone https://github.com/plasma-umass/coz && cd coz && cmake . && make && sudo make install && cd ..

Next, you need to change the "perf_event_paranoia" level so Coz can run.

sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

Now you can test Coz. Build the benchmark suite and run one of the benchmarks (the SQLite3 benchmark takes a while to build).

sudo apt-get install libbz2-dev libsqlite3-dev
cd coz/benchmarks && cmake . && make && cd ../..
coz run --- ./coz/benchmarks/toy/toy

Finally, use the Coz viewer to see the results. This command will open up a browser tab, from which you will need to load the file profile.coz.

coz plot

(You may need to move the "Minimum Points" slider on the left side to see the results.)

Using Coz

Using Coz requires a small amount of setup, but you can jump ahead to the section on the included sample applications in this repository if you want to try Coz right away.

To run your program with Coz, you will need to build it with debug information (-g -gdwarf-3). You do not need to include debug symbols in the main executable: coz uses the same procedure as gdb to locate debug information for stripped binaries.

Once you have your program built with debug information, you can run it with Coz using the command coz run {coz options} --- {program name and arguments}. But, to produce a useful profile you need to decide which part(s) of the application you want to speed up by specifying one or more progress points.

Profiling Modes

Coz departs from conventional profiling by making it possible to view the effect of optimizations on both throughput and latency. To profile throughput, you must specify a progress point. To profile latency, you must specify a pair of progress points.

Throughput Profiling: Specifying Progress Points

To profile throughput you must indicate a line in the code that corresponds to the end of a unit of work. For example, a progress point could be the point at which a transaction concludes, when a web page finishes rendering, or when a query completes. Coz then measures the rate of visits to each progress point to determine any potential optimization's effect on throughput.

To place a progress point, include coz.h (under the include directory in this repository) and add the COZ_PROGRESS macro to at least one line you would like to execute more frequently. Don't forget to link your program with libdl: use the -ldl option.

By default, Coz uses the source file and line number as the name for your progress points. If you use COZ_PROGRESS_NAMED("name for progress point") instead, you can provide an informative name for your progress points. This also allows you to mark multiple source locations that correspond to the same progress point.

Latency Profiling: Specifying Progress Points

To profile latency, you must place two progress points that correspond to the start and end of an event of interest, such as when a transaction begins and completes. Simply mark the beginning of a transaction with the COZ_BEGIN("transaction name") macro, and the end with the COZ_END("transaction name") macro. Unlike regular progress points, you always need to specify a name for your latency progress points. Don't forget to link your program with libdl: use the -ldl option.

When coz tests a hypothetical optimization it will report the effect of that optimization on the average latency between these two points. Coz can track this information without any knowledge of individual transactions thanks to Little's Law.

Specifying Progress Points on the Command Line

Coz has command line options to specify progress points when profiling the application instead of modifying its source. This feature is currently disabled because it did not work particularly well. Adding support for better command line-specified progress points is planned in the near future.

Processing Results

To plot profile results, go to http://plasma-umass.github.io/coz/ and load your profile. This page also includes several sample profiles from PARSEC benchmarks.

Sample Applications

The benchmarks directory in this repository includes several small benchmarks with progress points added at appropriate locations. To build and run one of these benchmarks with coz, just browse to benchmarks/{bench name} and type cmake . && make. These programs may require several runs before coz has enough measurements to generate a useful profile. Once you have profiled these programs for several minutes, go to http://plasma-umass.github.io/coz/ to load and plot your profile.

CMake

When you install coz it installs a cmake config file. To add coz to a cmake project simply use the command find_package(coz-profiler). This will import a target for the library and includes called coz::coz and a target for the coz binary coz::profiler. For guidance on how to use these targets refer to the CMake documentation.

Limitations

Coz currently does not support interpreted or JIT-compiled languages such as Python, Ruby, or JavaScript. Interpreted languages will likely not be supported at any point, but support for JIT-compiled languages that produce debug information could be added in the future.

License

All source code is licensed under the BSD 2-clause license unless otherwise indicated. See LICENSE.md for details.

Sample applications (in the benchmarks directory) include several Phoenix programs and pbzip2, which are licensed separately and included with this release for convenience.

More Repositories

1

scalene

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Python
12,131
star
2

browsix

Browsix is a Unix-like operating system for the browser.
JavaScript
3,149
star
3

doppio

Breaks the browser language barrier (includes a plugin-free JVM).
TypeScript
2,150
star
4

Mesh

A memory allocator that automatically reduces the memory footprint of C/C++ applications.
C++
1,618
star
5

ChatDBG

ChatDBG - AI-assisted debugging. Uses AI to answer 'why'
C++
772
star
6

slipcover

Near Zero-Overhead Python Code Coverage
Python
485
star
7

BLeak

BLeak: Automatically Debugging Memory Leaks in Web Applications
TypeScript
408
star
8

cwhy

"See why!" Explains and suggests fixes for compile-time errors for C, C++, C#, Go, Java, LaTeX, PHP, Python, Ruby, Rust, and TypeScript
C++
272
star
9

sqlwrite

SQLwrite: AI in your DBMS! Automatically converts natural language queries to SQL.
C
106
star
10

NextDoor

Graph Sampling using GPU
Cuda
49
star
11

DataDebug

Excel 2010/2013 add-in that automatically finds errors in spreadsheets
C#
46
star
12

coverup

Automatic AI-powered test suite generator
Python
37
star
13

systemgo

Init system in Go, intended to run on Browsix and other Unix-like OS. Part of GSoC 2016 project.
Go
36
star
14

sheriff

Sheriff consists of two tools: Sheriff-Detect, a false-sharing detector, and Sheriff-Protect, a false-sharing eliminator that you can link with your code to eliminate false sharing.
C++
29
star
15

DoubleTake

Evidence-based dynamic analysis: a fast checker for memory errors.
C
21
star
16

commentator

Automatically comments Python code, adding docstrings and type annotations, with optional translation to other languages.
Python
20
star
17

Predator

Predator: Predictive False Sharing Detection
C
19
star
18

memory-landscape

The space of memory management research and systems produced by the PLASMA lab (https://plasma-umass.org).
16
star
19

snakefish

parallel Python
Python
13
star
20

entroprise

measure entropy of memory allocators
C++
12
star
21

parcel

An Excel formula parser
C#
12
star
22

Rehearsal

Rehearsal: A Configuration Verification Tool for Puppet
Scala
12
star
23

Hound

Hound memory leak detector
C++
11
star
24

smash-project

Smash compressing allocator project
C++
10
star
25

browsix-spec

JavaScript
9
star
26

Archipelago

Archipelago memory allocator
C
8
star
27

simplesocket

A simple socket wrapper for C++.
C++
8
star
28

pythoness

Pythoness: use natural language to define Python functions.
Python
7
star
29

compsci631

Support code for Programming Languages (COMPSCI631)
OCaml
7
star
30

Tortoise

Tortoise: Interactive System Configuration Repair
Scala
6
star
31

scalene-gui

Scalene web GUI
JavaScript
5
star
32

llm-utils

Utilities for our LLM projects (CWhy, ChatDBG, ...).
Python
5
star
33

transparentFS

TransparentFS code, paper, and slides
C
5
star
34

homebrew-scalene

Homebrew tap for Scalene (emeryberger/scalene)
Ruby
4
star
35

GSoC

Description of our Google Summer of Code projects for 2015
4
star
36

vam

Implementation from "A Locality-Improving Dynamic Memory Allocator", Feng and Berger, MSP 2005
C++
4
star
37

HeapToss

HeapToss is an LLVM compiler pass that moves stack variables that may escape their declaring function's context into the heap.
3
star
38

pytest-cleanslate

Python
3
star
39

jsvm

JavaScript
2
star
40

GSoC-2013

Google Summer of Code 2013
2
star
41

plasma-umass.github.io

home page
HTML
2
star
42

spl

Rust
2
star
43

doppio_jcl

Scripts that produce a version of the Java Class Library and Java Home in a way that is compatible with DoppioJVM.
TypeScript
2
star
44

nextdoor-eurosys21

HTML
1
star
45

mesh-testsuite

C
1
star
46

ChatSheet

Python
1
star
47

custom-public

Jupyter Notebook
1
star
48

proto

probabilistic race tolerance
C
1
star
49

wasm-gc-template

C++
1
star
50

typissed

Generates MTurk typo jobs
C#
1
star
51

scalene-benchmarks

Benchmarks comparing Scalene with other commonly-used profilers
Python
1
star
52

emcc_control

C
1
star
53

transparentMM

Transparent memory management
1
star