• Stars
    star
    941
  • Rank 48,574 (Top 1.0 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast, parallel test case minimization tool.

Introducing halfempty 🥛


Fast, Parallel Testcase Minimization

Halfempty is a new testcase minimization tool, designed with parallelization in mind. Halfempty was built to use strategies and techniques that dramatically speed up the minimization process.

Background

Fuzzers find inputs that trigger bugs, but understanding those bugs is easier when you remove as much extraneous data as possible. This is called testcase minimization or delta debugging.

Minimization tools use various techniques to simplify the testcase, but the core algorithm is simply bisection. Bisection is an inherently serial process, you can't advance through the algorithm without knowing the result of each step. This data dependency problem can make minimization very slow, sometimes taking hours to complete while cpu cores sit idle.

Bisection

In this diagram you can see we progressively remove parts of the file to determine which section is interesting.

halfempty solves this problem using pessimistic speculative execution. We build a binary tree of all the possible bisection steps and then idle cores can speculatively test future steps ahead of our position in the algorithm. In many cases, the test results are already known by the time we need them.

We call it pessimistic, because real workloads are characterized by long series of consecutive failures. We simply assume that tests are going to fail, and speculatively follow the failure path until proven wrong.

Tree

In this diagram, you can see we generated a binary tree of all possible outcomes, and now idle cores can speculatively work ahead of the main thread.

If you're fuzzing a target that takes more than a few seconds to run then parallelizing the minimization can dramatically speedup your workflow. Real fuzzing inputs that take several seconds to reproduce can take many hours to complete using serial bisection, but halfempty can produce the same output in minutes.

In real tests, the author often finds speedup exceeding hours.

Path Path Full

This is a real minimization path from a fuzzer generated crash.

Halfempty generates a binary tree, and this graph shows the path through the tree from the root to the final leaf (discarded paths are hidden on the left to simplify the diagram).

The green nodes were successful and the red nodes were failures. The grey nodes in the right were explored but discarded. Because all consecutive red nodes are executed in parallel, the actual wall clock time required to minimize the input was minimal.

Each crash took ~11 seconds to reproduce, requiring about  34 minutes of compute time - but halfempty completed in just 6 minutes!

The original input was 240K, the final output was just 75 bytes.

Building

The only dependency is libglib2.0-dev, used for some useful data structures, like N-ary trees.

On RedHat systems, try glib2-devel.

Just type make to build the main binary.

The --monitor mode feature requires the graphviz package and a web browser.

The author has tested the following distributions:

  • CentOS 6 amd64
  • Ubuntu 14 amd64

Mac OS X

Halfempty has preliminary macOS support using homebrew.

Please use brew install pkg-config glib to install necessary dependencies, then make to build the main binary.

Usage

First, create a shell script that when given your input on stdin, returns zero.

A simple example might look like this if you wanted to test a gzip crash:

#!/bin/sh

gzip -dc

# Check if we were killed with SIGSEGV
if test $? -eq 139; then
    exit 0 # We want this input
else
    exit 1 # We don't want this input
fi

Make the file executable and verify it works:

$ chmod +x testgzip.sh
$ ./testgzip.sh < crashinput.gz && echo success || echo failure
success

Now simply run halfempty with your input and it will find the smallest version that still returns zero.

Note: If you need to create temporary files, see some advanced examples in the documentation.

$ halfempty testgzip.sh crashinput.gz

If everything worked, there should be a minimal output file in halfempty.out.

Screenshot

If you want to monitor what halfempty is doing, you can use --monitor mode, which will generate graphs you can watch in realtime. halfempty will generate a URL you can open, and you can view the data in your web browser.

Note: --monitor mode requires the graphviz package to be installed.

Screenshot

Options

Halfempty includes many options to fine tune the execution environment for the child processes, and tweak performance options. The full documentation can be shown with --help-all, but here are the most commonly useful parameters.

Parameter Description
--num-threads=threads Halfempty will default to using all available cores, but you can tweak this if you prefer.
--stable Sometimes different strategies can shake out new potential for minimizing.
If you enable this, halfempty will repeat all strategies until the output doesn't change.
(Slower, but recommended).
--timeout=seconds If tested programs can run too long, we can send them a SIGALRM.
You can catch this in your test script (see help trap) and cleanup if you like, or accept the default action and terminate.
--limit RLIMIT_???=N You can fine tune the resource limits available to child processes.
Perhaps you want to limit how much memory they can allocate, or enable core dumps.
An example might be --limit RLIMIT_CPU=600
--inherit-stdout
--inherit-stderr
By default, we discard all output from children.
If you want to see the output instead, you can disable this and you can see child error messages.
--zero-char=byte Halfempty tries to simplify files by overwriting data with nul bytes. This makes sense for binary file formats.
If you're minimizing text formats (html, xml, c, etc) then you might want whitespace instead.
Set this to 0x20 for space, or 0x0a for a newline.
--monitor If you have the graphviz package installed, halfempty can generate graphs so you watch the progress.
--no-terminate If halfempty guesses wrong, it might already be running your test on an input we know we don't need.
By default, we will try to kill it so we can get back to using that thread sooner.
You can disable this if you prefer.
--output=filename By default your output is saved to halfempty.out, but you can save it anywhere you like.
--noverify If tests are very slow, you can skip the initial verification and go straight to parallelization.
(Faster, but not recommended).
--generate-dot Halfempty can generate a dot file of the final tree state that you can inspect with xdot.
--gen-intermediate Save the best result as it's found, so you don't lose your progress if halfempty is interrupted.

Examples

There are more examples available in the wiki.

Creating temporary files

Note: Are you sure you need temporary files? Many programs will accept /dev/stdin.

If you need to create temporary files to give to your target program, you can simply do something like this.

#!/bin/sh
tempfile=`mktemp` && cat > ${tempfile}

yourprogram ${tempfile}

Remember to clean it up when you're done, you can do this if you like:

#!/bin/sh
tempfile=`mktemp` && cat > ${tempfile}
result=1

trap 'rm -f ${tempfile}; exit ${result}' EXIT TERM ALRM

yourprogram ${tempfile}

if test $? -eq 139; then
    result=0
fi

Verifying crashes

Sometimes your target program might crash with a different crash accidentally found during minimization. One solution might be to use gdb to verify the crash site.

#!/bin/sh
exec gdb -q                                                                 \
         -ex 'r'                                                            \
         -ex 'q !($_siginfo.si_signo == 11 && $pc == 0x00007ffff763f2e7)'   \
         -ex 'q 1'                                                          \
         --args yourprogram --yourparams

This will exit 0 if the signal number and crash address match, or 1 otherwise.

You can test various things such as registers ($rip, $eax, etc), fault address ($_siginfo._sifields._sigfault.si_addr), and many more. If you want to see more things you can test, try the command show conv in gdb.

FAQ

Q. What does finalized mean in halfempty output?

A. Halfempty works by guessing what the results of tests will be before the real result is known. If the path through the bisection tree from the root node to the final leaf was entirely through nodes where we knew the result, then the path is finalized (as opposed to pending).

Q. Where does the name come from?

A. We use pessimistic speculative execution, so the glass is always half empty? ....? Sorry. 🥛

Q. How can I kill processes that take too long?

A. Use --timeout 10 to send a signal that can be caught after 10 seconds, or --limit RLIMIT_CPU=10 to enforce a hard limit.

Q. Halfempty wastes a lot of CPU time exploring paths, so is it really faster?

A. It's significantly faster in real time (i.e. wall clock time), that's what counts!

Q. I have a very large input, what do I need to know?

A. Halfempty is less thorough by default on very large inputs that don't seem to minimize well. Removing each byte from multi-gigabyte inputs just takes too long, even when run in parallel.

If you really want halfempty to be thorough, you can do this:

$ halfempty --bisect-skip-multiplier=0 --zero-skip-multiplier=0 --stable --gen-intermediate harness.sh input.bin

  • --bisect-skip-multiplier=0 and --zero-skip-multiplier=0 means to try removing every single byte.
  • --stable means to keep retrying minimization until it no further removals work.
  • --gen-intermediate means to save the best result as it's found, so you won't lose your work if you change your mind.

On the other hand, if you just want halfempty to be faster and don't care if it's not very thorough, you can do the opposite. Something like this:

$ halfempty --bisect-skip-multiplier=0.01 --zero-skip-multiplier=0.01 harness.sh input.bin

The reasonable range for the multiplier is 0 to 0.1.

BUGS

  • If your program intercepts signals or creates process groups, it might be difficult to cleanup.
  • For very long trees, we keep an fd open for each successful node. It's possible we might exhaust fds.

Please report more bugs or unexpected results to [email protected]. The author intends to maintain this tool and make it a stable and reliable component of your fuzzing workflow.

Better quality bug reports require simpler reproducers, and that requires good quality tools.

FUTURE

  • The next version will allow the level of pessimism to be controlled at runtime.

AUTHORS

Tavis Ormandy [email protected]

LICENSE

Apache 2.0, See LICENSE file for details.

NOTICE

This is not an officially supported Google product.

More Repositories

1

winafl

A fork of AFL for fuzzing Windows binaries
C
2,311
star
2

sandbox-attacksurface-analysis-tools

Set of tools to analyze Windows sandboxes for exposed attack surface.
C#
2,047
star
3

fuzzilli

A JavaScript Engine Fuzzer
Swift
1,859
star
4

weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Rust
1,857
star
5

domato

DOM fuzzer
Python
1,672
star
6

TinyInst

A lightweight dynamic instrumentation library
C++
1,158
star
7

Jackalope

Binary, coverage-guided fuzzer for Windows, macOS, Linux and Android
C++
1,068
star
8

0days-in-the-wild

Repository for information about 0-days exploited in-the-wild.
HTML
753
star
9

symboliclink-testing-tools

C++
747
star
10

p0tools

Project Zero Docs and Tools
C++
698
star
11

ktrw

An iOS kernel debugger based on a KTRR bypass for A11 iPhones; works with LLDB and IDA Pro.
C
660
star
12

functionsimsearch

Some C++ example code to demonstrate how to perform code similarity searches using SimHashing.
C++
559
star
13

BrokenType

TrueType and OpenType font fuzzing toolset
C++
430
star
14

iOS-messaging-tools

Python
368
star
15

SockFuzzer

C
367
star
16

SkCodecFuzzer

Fuzzing harness for testing proprietary image codecs supported by Skia on Android
C++
331
star
17

bochspwn

A Bochs-based instrumentation project designed to log kernel memory references, to identify "double fetches" and other OS vulnerabilities
C++
319
star
18

bochspwn-reloaded

A Bochs-based instrumentation performing kernel memory taint tracking to detect disclosure of uninitialized memory to ring 3
C++
284
star
19

Street-Party

Street Party is a suite of tools that allows the RTP streams of video conferencing implementations to be viewed and modified.
C++
242
star
20

DrSancov

DynamoRIO plugin to get ASAN and SanitizerCoverage compatible output for closed-source executables
C++
203
star
21

CompareCoverage

Clang instrumentation module for tracing variable and buffer comparisons in C/C++ and saving the coverage data to .sancov files
C++
200
star
22

Hyntrospect

PowerShell
179
star
23

reil

C++
59
star
24

.allstar

1
star
25

.github

1
star