• Stars
    star
    2,469
  • Rank 18,619 (Top 0.4 %)
  • Language
    Go
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸš€ fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together.

go.dev reference GitHub Workflow Status GitHub go-recipes

πŸš€ fgprof - The Full Go Profiler

fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together.

Go's builtin sampling CPU profiler can only show On-CPU time, but it's better than fgprof at that. Go also includes tracing profilers that can analyze I/O, but they can't be combined with the CPU profiler.

fgprof is designed for analyzing applications with mixed I/O and CPU workloads. This kind of profiling is also known as wall-clock profiling.

⚠️ Please upgrade to Go 1.19 or newer. In older versions of Go fgprof can cause significant STW latencies in applications with a lot of goroutines (> 1-10k). See CL 387415 for more details.

Quick Start

If this is the first time you hear about fgprof, you should start by reading about The Problem & How it Works.

There is no need to choose between fgprof and the builtin profiler. Here is how to add both to your application:

package main

import(
	_ "net/http/pprof"
	"github.com/felixge/fgprof"
)

func main() {
	http.DefaultServeMux.Handle("/debug/fgprof", fgprof.Handler())
	go func() {
		log.Println(http.ListenAndServe(":6060", nil))
	}()

	// <code to profile>
}

fgprof is compatible with the go tool pprof visualizer, so taking and analyzing a 3s profile is as simple as:

go tool pprof --http=:6061 http://localhost:6060/debug/fgprof?seconds=3

Additionally fgprof supports the plain text format used by Brendan Gregg's FlameGraph utility:

git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
curl -s 'localhost:6060/debug/fgprof?seconds=3&format=folded' > fgprof.folded
./flamegraph.pl fgprof.folded > fgprof.svg

Which tool you prefer is up to you, but one thing I like about Gregg's tool is that you can filter the plaintext files using grep which can be very useful when analyzing large programs.

If you don't have a program to profile right now, you can go run ./example which should allow you to reproduce the graphs you see above. If you've never seen such graphs before, and are unsure how to read them, head over to Brendan Gregg's Flame Graph page.

The Problem

Let's say you've been tasked to optimize a simple program that has a loop calling out to three functions:

func main() {
	for {
		// Http request to a web service that might be slow.
		slowNetworkRequest()
		// Some heavy CPU computation.
		cpuIntensiveTask()
		// Poorly named function that you don't understand yet.
		weirdFunction()
	}
}

One way to decide which of these three functions you should focus your attention on would be to wrap each function call like this:

start := time.Start()
slowNetworkRequest()
fmt.Printf("slowNetworkRequest: %s\n", time.Since(start))
// ...

However, this can be very tedious for large programs. You'll also have to figure out how to average the numbers in case they fluctuate. And once you've done that, you'll have to repeat the process for the functions called by the function you decide to focus on.

/debug/pprof/profile

So, this seems like a perfect use case for a profiler. Let's try the /debug/pprof/profile endpoint of the builtin net/http/pprof pkg to analyze our program for 10s:

import _ "net/http/pprof"

func main() {
	go func() {
		log.Println(http.ListenAndServe(":6060", nil))
	}()

	// <code to profile>
}
go tool pprof -http=:6061 http://localhost:6060/debug/pprof/profile?seconds=10

That was easy! Looks like we're spending all our time in cpuIntensiveTask(), so let's focus on that?

But before we get carried away, let's quickly double check this assumption by manually timing our function calls with time.Since() as described above:

slowNetworkRequest: 66.815041ms
cpuIntensiveTask: 30.000672ms
weirdFunction: 10.64764ms
slowNetworkRequest: 67.194516ms
cpuIntensiveTask: 30.000912ms
weirdFunction: 10.105371ms
// ...

Oh no, the builtin CPU profiler is misleading us! How is that possible? Well, it turns out the builtin profiler only shows On-CPU time. Time spent waiting on I/O is completely hidden from us.

/debug/pprof/trace

Let's try something else. The /debug/pprof/trace endpoint includes a "synchronization blocking profile", maybe that's what we need?

curl -so pprof.trace http://localhost:6060/debug/pprof/trace?seconds=10
go tool trace --pprof=sync pprof.trace > sync.pprof
go tool pprof --http=:6061 sync.pprof

Oh no, we're being mislead again. This profiler thinks all our time is spent on slowNetworkRequest(). It's completely missing cpuIntensiveTask(). And what about weirdFunction()? It seems like no builtin profiler can see it?

/debug/fgprof

So what can we do? Let's try fgprof, which is designed to analyze mixed I/O and CPU workloads like the one we're dealing with here. We can easily add it alongside the builtin profilers.

import(
	_ "net/http/pprof"
	"github.com/felixge/fgprof"
)

func main() {
	http.DefaultServeMux.Handle("/debug/fgprof", fgprof.Handler())
	go func() {
		log.Println(http.ListenAndServe(":6060", nil))
	}()

	// <code to profile>
}
go tool pprof --http=:6061 http://localhost:6060/debug/fgprof?seconds=10

Finally, a profile that shows all three of our functions and how much time we're spending on them. It also turns out our weirdFunction() was simply calling time.Sleep(), how weird indeed!

How it Works

fgprof

fgprof is implemented as a background goroutine that wakes up 99 times per second and calls runtime.GoroutineProfile. This returns a list of all goroutines regardless of their current On/Off CPU scheduling status and their call stacks.

This data is used to maintain an in-memory stack counter which can be converted to the pprof or folded output format. The meat of the implementation is super simple and < 100 lines of code, you should check it out.

The overhead of fgprof increases with the number of active goroutines (including those waiting on I/O, Channels, Locks, etc.) executed by your program. If your program typically has less than 1000 active goroutines, you shouldn't have much to worry about. However, at 10k or more goroutines fgprof might start to cause some noticeable overhead.

Go's builtin CPU Profiler

The builtin Go CPU profiler uses the setitimer(2) system call to ask the operating system to be sent a SIGPROF signal 100 times a second. Each signal stops the Go process and gets delivered to a random thread's sigtrampgo() function. This function then proceeds to call sigprof() or sigprofNonGo() to record the thread's current stack.

Since Go uses non-blocking I/O, Goroutines that wait on I/O are parked and not running on any threads. Therefore they end up being largely invisible to Go's builtin CPU profiler.

Known Issues

There is no perfect approach to profiling, and fgprof is no exception. Below is a list of known issues that will hopefully not be of practical concern for most users, but are important to highlight.

  • Internal C functions are not showing up in the stack traces, e.g. runtime.nanotime which is called by time.Since in the example program.
  • The current implementation is relying on the Go scheduler to schedule the internal goroutine at a fixed sample rate. Scheduler delays, especially biased ones, might cause inaccuracies.

Credits

The following articles helped me to learn more about how profilers in general, and the Go profiler in particular work.

License

fgprof is licensed under the MIT License.

More Repositories

1

node-style-guide

A guide for styling your node.js / JavaScript code. Fork & adjust to your taste.
JavaScript
4,950
star
2

node-ar-drone

A node.js client for controlling Parrot AR Drone 2.0 quad-copters.
JavaScript
1,755
star
3

node-dateformat

A node.js package for Steven Levithan's excellent dateFormat() function.
JavaScript
1,297
star
4

node-memory-leak-tutorial

A tutorial for debugging memory leaks in node
JavaScript
909
star
5

httpsnoop

Package httpsnoop provides an easy way to capture http related metrics (i.e. response time, bytes written, and http status code) from your application's http.Handlers.
Go
891
star
6

fgtrace

fgtrace is an experimental profiler/tracer that is capturing wallclock timelines for each goroutine. It's very similar to the Chrome profiler.
Go
878
star
7

faster-than-c

Talk outline: Faster than C? Parsing binary data in JavaScript.
JavaScript
836
star
8

node-dirty

A tiny & fast key value store with append-only disk log. Ideal for apps with < 1 million records.
JavaScript
625
star
9

node-stack-trace

Get v8 stack traces as an array of CallSite objects.
JavaScript
449
star
10

nodeguide.com

My unofficial and opinionated guide to node.js.
CSS
371
star
11

node-couchdb

A new CouchDB module following node.js idioms
JavaScript
364
star
12

sqlbench

sqlbench measures and compares the execution time of one or more SQL queries.
Go
361
star
13

node-sandboxed-module

A sandboxed node.js module loader that lets you inject dependencies into your modules.
JavaScript
344
star
14

node-require-all

An easy way to require all files within a directory.
JavaScript
300
star
15

tcpkeepalive

Go package tcpkeepalive implements additional TCP keepalive control beyond what is currently offered by the net pkg.
Go
238
star
16

node-paperboy

A node.js module for delivering static files.
JavaScript
234
star
17

godrone

GoDrone is a free software alternative firmware for the Parrot AR Drone 2.0.
Go
204
star
18

node-romulus

Building static empires with node.js.
JavaScript
157
star
19

node-gently

A node.js module that helps with stubbing and behavior verification.
JavaScript
142
star
20

node-combined-stream

A stream that emits multiple other streams one after another.
JavaScript
142
star
21

cakephp-authsome

Auth for people who hate the Auth component
PHP
123
star
22

pprofutils

Go
122
star
23

node-growing-file

A readable file stream for files that are growing.
JavaScript
106
star
24

node-graphite

A node.js client for graphite.
JavaScript
105
star
25

node-cross-compiler

Simplified cross compiling for node.js using vagrant.
Shell
105
star
26

pidctrl

A PID controller implementation in Golang.
Go
96
star
27

node-m3u

A node.js module for creating m3u / m3u8 files.
JavaScript
89
star
28

debuggable-scraps

MIT licensed code without warranty ; )
PHP
79
star
29

traceutils

Code for decoding and encoding runtime/trace files as well as useful functionality implemented on top.
Go
62
star
30

node-delayed-stream

Buffers events from a stream until you are ready to handle them.
JavaScript
56
star
31

go-redis

A redis implementation written in Go.
Go
53
star
32

nodelog

A node.js irc bot that logs a channel
JavaScript
49
star
33

flame-explain

A PostgreSQL EXPLAIN ANALYZE visualizer with advanced quirk correction algorithms.
TypeScript
46
star
34

node-stream-cache

A simple way to cache and replay readable streams.
JavaScript
45
star
35

node-utest

The minimal unit testing library.
JavaScript
42
star
36

go-cpu-utilization

Go
39
star
37

go-xxd

The history of this repo demonstrates how to take a slow xxd implementation in Go, and make it faster than the native version on OSX/Linux.
Go
38
star
38

vim-nodejs-errorformat

Vim Script
36
star
39

tweets

C
35
star
40

go-ardrone

Parrot AR Drone 2.0 drivers and protocols written in Go.
Go
33
star
41

dotfiles

My setup. Pick what you like.
Lua
31
star
42

node-buffy

A module to read / write binary data and streams.
JavaScript
31
star
43

node-urun

The minimal test runner.
JavaScript
31
star
44

node-multipart-parser

A fast and streaming multipart parser.
JavaScript
30
star
45

node-require-like

Generates require functions that act as if they were operating in a given path.
JavaScript
29
star
46

benchmore

Go
28
star
47

node-nix

Node.js bindings for non-portable *nix functions
JavaScript
28
star
48

node-fake

Test one thing at a time, fake the rest.
JavaScript
28
star
49

node-bash

Utilities for using bash from node.js.
JavaScript
25
star
50

gounwind

Experimental go stack unwinding using frame pointers.
Go
25
star
51

node-microtest

Unit testing done right.
JavaScript
23
star
52

pgmigrate

pgmigrate implements a minimalistic migration library for postgres.
Go
22
star
53

node-comment

Proof of concept - Long polling message queue with CouchDB for persistence.
JavaScript
21
star
54

node-ugly

A hack so unbelievably ugly, yet so hard to resist
JavaScript
20
star
55

advent-2021

Advent of Go Profiling 2021.
Go
19
star
56

open-source-contribution-guide

A guide for anybody interested in contribution to my open source projects.
18
star
57

go-patch-overlay

WIP
Go
17
star
58

node-channel

A general purpose comet server written in node.js
JavaScript
16
star
59

node-active-x-obfuscator

A module to (safely) obfuscate all occurrences of the string 'ActiveX' inside any JavaScript code.
JavaScript
16
star
60

gotraceanalyzer

Command gotraceanalyzer turns golang tracebacks into useful summaries.
Go
14
star
61

go-observability-bench

Measure the overheads of various observability tools, especially profilers.
Jupyter Notebook
14
star
62

rebel-resize

Dynamic image resizing server written during my web rebels 2012 live coding.
JavaScript
13
star
63

node-fast-or-slow

Are your tests fast or slow? A pragmatic testing framework.
JavaScript
13
star
64

cl

Quickly clone git repositories into a nested folders like GOPATH.
Go
13
star
65

node-lazy-socket

A stateless socket that always lets you write().
JavaScript
13
star
66

raleigh-workshop-08

Code repository for the Raleigh, NC CakePHP workshop
PHP
12
star
67

node-deferred

Dojo deferreds as a nodejs module - Work in Progress
JavaScript
12
star
68

node-oop

Simple & light-weight oop.
JavaScript
11
star
69

node-win-iap

Verifies windows store receipts.
JavaScript
10
star
70

goardronefirmware

Open source firmware for the Parrot AR Drone 2.0 written in Go.
Go
10
star
71

node-far

https://github.com/felixge/node-far
JavaScript
10
star
72

node-convert-example

Node.js image resizing demo. One version with and one version without in-memory caching.
10
star
73

couchdb-benchmarks

some benchmark scripts for testing CouchDB performance
PHP
10
star
74

node-socketio-benchmark

A WebSocket / LongPolling simulation to estimate users / core
JavaScript
9
star
75

gpac

Mirror of https://gpac.svn.sourceforge.net/svnroot/gpac/trunk/gpac + my patches
C
9
star
76

node-passthrough-stream

An example of a passthrough stream for node.js
JavaScript
9
star
77

node-http-recorder

A little tool to record and replay http requests.
JavaScript
9
star
78

node-cluster-isolatable

Isolate workers so they only handle one request at a time. Useful for file uploads.
JavaScript
8
star
79

nodecopter-ssh-tunnel

Bash scripts for controlling an AR Drone over the internet via ssh tunneling.
Shell
8
star
80

makefs

WIP - come back later.
Go
8
star
81

node-unicode-sanitize

JavaScript
8
star
82

felixge.de

My site and blog.
HTML
7
star
83

dump

A code dump of things not worth putting into their own repo.
Go
7
star
84

ooti

A kickass test suite for node.js
JavaScript
6
star
85

go-cgo-finalizer

Demonstrates using runtime.SetFinalizer to free cgo memory allocations.
Go
6
star
86

focus-app

Helps you focus by hiding all your windows except the ones you are currently working in.
Objective-C
6
star
87

gopg

Go
5
star
88

isalphanumeric

A small arm64 SIMD adventure for gophers.
Go
5
star
89

dd-trace-go-demo

A simple application to show how to use dd-trace-go's tracer and profiler.
Go
5
star
90

profiler-simulator

Go
5
star
91

talks

Source and slides for my presentations.
PLpgSQL
5
star
92

node-redis-pool

A simple node.js redis pool.
JavaScript
5
star
93

countermap

Go
5
star
94

pprof-breakdown

Go
5
star
95

proftest

proftest is a C application for testing the quality of different operating system APIs for profiling.
C
5
star
96

s3.sh

Bash functions for Amazon S3. (Not complete, just scratching my itch)
Shell
5
star
97

can

Nothing to see here yet.
Go
4
star
98

js-robocom

A robocom inspired programming game for JavaScript
JavaScript
4
star
99

log

nothing to see here yet
Go
4
star
100

dd-prof-upload

Go
4
star