• Stars
    star
    305
  • Rank 132,274 (Top 3 %)
  • Language
    Perl
  • Created about 11 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Heat map generation tools
HeatMap

Some software to generate heat maps:

trace2heatmap.pl	converts a trace of per-event latency to an interactive SVG heat map.

See http://dtrace.org/blogs/brendan/2013/05/19/revealing-hidden-latency-patterns


trace2heatmap
=============
This is a quick program to generate heat maps from trace files.  I wrote it
in 3 hours, so it's probably buggy (especially input checking).

It takes input as two numerical columns, a time and a latency. For example:

$ more trace.txt
17442020318913 8026
17442020325950 6798
17442020333082 6907
17442020339374 6065
[...]

Each row is an event (eg, an I/O).  The first column is time of the event and
the second is latency.  In this example, both columns are microseconds.
See the later Generating Latency Traces section for how to generate these.

This example can converted into an SVG heatmap using:

$ ./trace2heatmap.pl --unitstime=us --unitslabel=us trace.txt > heatmap.svg

Other units can also be used ("ms", "ns").

The y-axis will auto-scale to include everything, including latency outliers.
While useful, you generally want to generate a second heatmap that excludes
those so you can study the bulk of the data.  Eg:

$ ./trace2heatmap.pl --unitstime=us --unitslabel=us --maxlat=10000 trace.txt > heatmap2.svg

This limits the latency range to 10000 us.

A --minlat option can also be used.  Run --help for the full list, which
includes --titletext to customize the title, and --grid to draw grid lines.

When doing a mouse-over of rectangles (histogram buckets), the following
information will be displayed at the bottom of the heat map:

- time: elapsed time in seconds.
- range: latency range (y-axis) shown by the rectangle.
- count: number of events in this rectangle (time and latency range).
- pct: shows number of events in this rectangle as a percentage of all those in the column.
- acc: accumulated count, counting from bottom-up in the column.
- acc pct: accumulated count as a percentage. This can be used to find the percentile points.

trace2heatmap can generate other heat maps, not just latency.  As an example of
another type, see the utilization heat map in:
http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization


Generating Latency Traces
=========================
An example trace file is included, example-trace.txt, which was generated
using a DTrace program called iosnoop (from the DTraceToolkit):

$ ./iosnoop -Dt > out.iosnoop
$ awk '{ print $1, $2 }' out.iosnoop > example-trace.txt

These are performed as separate steps so that the original iosnoop output can
be reinspected to see more details if interesting features were found in the
heat map.  I typically run it with "iosnoop -Dots".  Note that most versions
of iosnoop need dynvarsize increased to avoid "dynamic variable drops": find
the line that has "#pragma D option quiet" and add the following line below
it: "#pragma D option dynvarsize=16m".

Here's an example DTrace one-liner that will generate trace output, both 
columns in microseconds, for syscall reads:

$ dtrace -qn 'syscall::read:entry { self->ts = timestamp; }
    syscall::read:return /self->ts/ {
    printf("%d %d\n", timestamp / 1000, (timestamp - self->ts) / 1000); self->ts = 0; }'

This is system-wide; add a predicate to filter for applications.

I could add more examples, but you probably get the picture: anything that can
emit times and latency can be processed.


Tracing in Production
=====================
Tracing per-event latency can be expensive to perform.  DTrace minimizes the
overheads as much as possible using per-CPU buffers and asynchronous kernel-
user transfers; other tools (eg, strace, tcpdump) are expected to have higher
overhead.  This can cause problems for production use: you wan to understand
the overhead, including when using DTrace, before tracing events.

Heat maps have been used successfully in production -- and recorded at a one
second granularity 24x7x365 -- by some products built upon DTrace.  These use
the DTrace aggregating feature to pass a quantized summary of latency to
user-level, instead of every event, cutting the data transfer down by a
large factor (eg, 1000x). This summary may consist of a per-second array with
about 200 elements for different latency ranges, each containing the count of
events, and is from the DTrace aggregating actions @quantize, @lquantize, or
@llquantize (best).  This array is then resampled (downsampled) to the
resolution desired for the heat map (usually down to 30 or so levels).
Example products that do this are the Oracle ZFS Storage Appliance, and Joyent
Cloud Analytics.


Provided Example
================
An example output is included of a disk I/O trace, and the resulting heat map.
You can generate it using:

$ ./trace2heatmap.pl --unitstime=us --unitslabel=us --maxlat=2000 --grid example-trace.txt > example-heatmap.svg 

This excluded outliers, so that the bulk of the I/O could be examined.

More Repositories

1

FlameGraph

Stack trace visualizer
Perl
16,494
star
2

perf-tools

Performance analysis tools based on Linux perf_events (aka perf) and ftrace
Shell
9,533
star
3

bpf-perf-tools-book

Official repository for the BPF Performance Tools book
Python
1,492
star
4

pmc-cloud-tools

PMC (Performance Monitoring Counter) tools for the cloud
Shell
233
star
5

Chaosreader

An any-snarf program that processes application protocols (HTTP/FTP/...) from tcpdump or snoop files and stores session and file data
216
star
6

dtrace-cloud-tools

Some DTrace tools written for the SmartOS/SmartDataCenter cloud (illumos-based)
D
202
star
7

wss

Working Set Size tools
C
200
star
8

systemtap-lwtools

SystemTap Lightweight Tools
182
star
9

bpf-perf-workshop

C
179
star
10

msr-cloud-tools

MSR Cloud Tools
Shell
173
star
11

DTrace-book-scripts

Scripts from "DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD", by Brendan Gregg and Jim Mauro, Prentice Hall, 2011.
167
star
12

BPF-tools

Performance Tools using Linux eBPF
C
118
star
13

perf-labs

Performance analysis labs
C
89
star
14

DTrace-tools

DTrace tools for FreeBSD
DTrace
69
star
15

PerfModels

Performance Scalability Models
R
67
star
16

Misc

Misc
Shell
65
star
17

Dump2PNG

Visualize file data as a PNG
C
40
star
18

GuessingGame

Guessing game written in many programming languages
Batchfile
33
star
19

proc-profiler

Linux /proc/PID/stack profiler
Perl
32
star
20

bpf-typewriter

BPF noisy typewriter (bpftrace)
23
star
21

p1bench

Perturbation Benchmark
C
20
star
22

DTraceToolkit

A collection of useful, tested and documented DTrace scripts
16
star
23

FlameScope

coming soon
Perl
13
star
24

skid-testing

Processor PMC sample skid testing
C
7
star
25

wrmsrbench

WRMSR micro benchmark
C
7
star
26

Test

testing github
JavaScript
3
star
27

brendangregg.github.io

HTML
3
star