• This repository has been archived on 16/May/2023
  • Stars
    star
    200
  • Rank 195,325 (Top 4 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created almost 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clang instrumentation module for tracing variable and buffer comparisons in C/C++ and saving the coverage data to .sancov files

CompareCoverage

CompareCoverage (CmpCov in short) is a simple instrumentation module for C/C++ programs and libraries, which extracts information about data comparisons taking place in the code at run time, and saves it to disk in the form of standard .sancov files. It is based on the SanitizerCoverage instrumentation available in the clang compiler, which itself is tightly related to AddressSanitizer. Specifically, the library implements the instrumentation callbacks defined by the Tracing data flow feature of SanitizerCoverage.

The tool works similarly to how "regular" code coverage information is saved by SanitizerCoverage when the target is compiled with the -fsanitize-coverage=trace-pc-guard flag. The output generated by this tool is complimentary to the basic edge-based coverage, and is meant to be used as a sub-instruction profiling instrument, which makes it possible for fuzzers to progress through 16/32/64-bit constants and textual strings expected in the input stream. For reference, see e.g.:

  1. http://taviso.decsystem.org/making_software_dumber.pdf
  2. https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/

In various forms, similar instrumentation is employed in the afl, libFuzzer and honggfuzz fuzzers. CompareCoverage may prove useful when coupled with custom, dedicated fuzzers outside of the above list.

Building

Makefiles for both Windows and GNU/Linux are provided. The end result is a static library which can be linked the your target software.

Note: The library is written in C++. When linking with software written in C, it might be necessary to add an extra -lstdc++ flag to the linker command line.

Linux

On Linux, libcmpcov.a is generated as shown below:

$ make -f Makefile.linux
clang++ -c -o cmpcov.o cmpcov.cc -O2 -fPIC
clang++ -c -o common.o common.cc -O2 -fPIC
clang++ -c -o modules.o modules.cc -O2 -fPIC
clang++ -c -o tokenizer.o tokenizer.cc -O2 -fPIC
clang++ -c -o traces.o traces.cc -O2 -fPIC
ar cr libcmpcov.a cmpcov.o common.o modules.o tokenizer.o traces.o
$

To build a program with AddressSanitizer, SanitizerCoverage and CompareCoverage, add the -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp flags to the compilation step (e.g. CFLAGS or CXXFLAGS), and -fsanitize=address -Wl,--whole-archive -L/cmpcov/directory/path -lcmpcov -Wl,--no-whole-archive to the linking step (e.g. LDFLAGS):

$ clang++ -c test.cc -o test.o -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
$ clang++ test.o -o test -fsanitize=address -Wl,--whole-archive -L../cmpcov -lcmpcov -Wl,--no-whole-archive
$

Windows

Compilation of cmpcov.lib is achieved as follows:

>make -f Makefile.win
clang-cl -c -o cmpcov.o cmpcov.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o common.o common.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o modules.o modules.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o tokenizer.o tokenizer.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o traces.o traces.cc -O2 -Wno-deprecated-declarations
llvm-lib /out:cmpcov.lib cmpcov.o common.o modules.o tokenizer.o traces.o
>

To build the target software with the complete instrumentation, add the -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp flags to the compiler command line, and -fsanitize=address -L/cmpcov/directory/path -lcmpcov in the linking stage, e.g.:

>clang++ -c test.cc -o test.o -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
>clang++ test.o -o test.exe -fsanitize=address -lcmpcov -L../cmpcov
>

Usage

CmpCov is generally controlled by the same ASAN_OPTIONS environment variable as SanitizerCoverage, and it currently supports two flags: coverage and coverage_dir. For example, to enable dumping the coverage information to disk, and have it saved in the logs directory, you can start your tested program as follows:

$ ASAN_OPTIONS=coverage=1,coverage_dir=logs ./test <<< "The quick"
CmpSanitizerCoverage: logs/cmp.test.75048.sancov: 9 PCs written
SanitizerCoverage: logs/test.75048.sancov: 2 PCs written
$ ls logs/
cmp.test.75048.sancov  test.75048.sancov
$

The test program above expected the "The quick brown fox ..." string on standard input, and because we provided a few of the first valid bytes, some comparison traces were generated and saved in an extra log file with a name starting with cmp. The more matching bytes there are at the beginning of a memory buffer or variable, the more traces are generated. The format of the output files is equivalent to that of typical .sancov files, and consists of a 64-bit header denoting the width of subsequent items (32/64-bit), followed by the traces themselves:

$ hexdump -C logs/test.75048.sancov
00000000  64 ff ff ff ff ff bf c0  81 e1 52 00 00 00 00 00  |d.........R.....|
00000010  7a e2 52 00 00 00 00 00                           |z.R.....|
00000018
$ hexdump -C logs/cmp.test.75048.sancov
00000000  64 ff ff ff ff ff bf c0  43 e2 12 00 00 00 01 f0  |d.......C.......|
00000010  43 e2 12 00 00 00 02 f0  43 e2 12 00 00 00 03 f0  |C.......C.......|
00000020  43 e2 12 00 00 00 04 f0  43 e2 12 00 00 00 05 f0  |C.......C.......|
00000030  43 e2 12 00 00 00 06 f0  43 e2 12 00 00 00 07 f0  |C.......C.......|
00000040  43 e2 12 00 00 00 08 f0  43 e2 12 00 00 00 09 f0  |C.......C.......|
00000050
$

In 64-bit mode, the lower 48 bits contain the instruction offset within the given module, while the upper 16 bits encode information about the comparison (type, switch/case index, number of matching bytes). In 32-bit mode, it is the same value, but hashed and truncated to 32 bits. For more details, please refer to the source code.

Additional TRACE_NONCONST_CMP and TRACE_MEMORY_CMP environment variables are available to control the instrumentation of non-const comparisons (off by default), and the instrumentation of memory/string functions (on by default).

The instrumentation was specifically designed to be compatible with the corpus management algorithm described in Effective File Format Fuzzing, but should work well with any other approach to corpus distillation.

Example

To better illustrate the capabilities of CmpCov and tracing data flow in general, we developed a demonstration program demo.cc, which expects the following data on standard input:

  • A "The quick brown fox " string checked with memcmp,
  • A "jumps over " string checked with strncmp,
  • A "the lazy dog" string checked with strcmp,
  • A 0xCAFEBABECAFEBABE 64-bit constant,
  • A 0xDEADC0DE 32-bit constant,
  • A 0xBEEF 16-bit constant.

Furthermore, we built a trivial fuzzer, which replaces subsequent bytes in the input stream with random values, until the coverage grows. A conventional fuzzer without any insight into the comparisons taking place wouldn't be able to progress through the checks. With CmpCov, all 57 bytes of input were successfully discovered in less than 4 minutes in our test run:

$ python fuzzer.py ./demo
---------- Initial coverage (2019-02-05 16:58:10, 2 traces) ----------
00000000: 26 3d 77 b7 bc bf 82 41 b4 a6 f2 c0 57 57 54 18 &=w....A....WWT.
00000010: 0c 29 01 72 e5 d4 a6 c0 ce bd b9 02 6c 87 24 48 .).r........l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d                      L..I.!.*.

[...]

---------- New coverage (2019-02-05 16:59:10, 24 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a d4 a6 c0 ce bd b9 02 6c 87 24 48 fox j.......l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d                      L..I.!.*.

---------- New coverage (2019-02-05 16:59:14, 25 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a 75 a6 c0 ce bd b9 02 6c 87 24 48 fox ju......l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d                      L..I.!.*.

[...]

---------- New coverage (2019-02-05 17:01:34, 65 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a 75 6d 70 73 20 6f 76 65 72 20 74 fox jumps over t
00000020: 68 65 20 6c 61 7a 79 20 64 6f 67 be ba fe ca be he lazy dog.....
00000030: ba fe ca de c0 ad de ef be                      .........

$

Disclaimer

This is not an official Google product.

More Repositories

1

winafl

A fork of AFL for fuzzing Windows binaries
C
2,311
star
2

sandbox-attacksurface-analysis-tools

Set of tools to analyze Windows sandboxes for exposed attack surface.
C#
2,047
star
3

fuzzilli

A JavaScript Engine Fuzzer
Swift
1,859
star
4

weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Rust
1,857
star
5

domato

DOM fuzzer
Python
1,672
star
6

TinyInst

A lightweight dynamic instrumentation library
C++
1,158
star
7

Jackalope

Binary, coverage-guided fuzzer for Windows, macOS, Linux and Android
C++
1,068
star
8

halfempty

A fast, parallel test case minimization tool.
C
941
star
9

0days-in-the-wild

Repository for information about 0-days exploited in-the-wild.
HTML
753
star
10

symboliclink-testing-tools

C++
747
star
11

p0tools

Project Zero Docs and Tools
C++
698
star
12

ktrw

An iOS kernel debugger based on a KTRR bypass for A11 iPhones; works with LLDB and IDA Pro.
C
660
star
13

functionsimsearch

Some C++ example code to demonstrate how to perform code similarity searches using SimHashing.
C++
559
star
14

BrokenType

TrueType and OpenType font fuzzing toolset
C++
430
star
15

iOS-messaging-tools

Python
368
star
16

SockFuzzer

C
367
star
17

SkCodecFuzzer

Fuzzing harness for testing proprietary image codecs supported by Skia on Android
C++
331
star
18

bochspwn

A Bochs-based instrumentation project designed to log kernel memory references, to identify "double fetches" and other OS vulnerabilities
C++
319
star
19

bochspwn-reloaded

A Bochs-based instrumentation performing kernel memory taint tracking to detect disclosure of uninitialized memory to ring 3
C++
284
star
20

Street-Party

Street Party is a suite of tools that allows the RTP streams of video conferencing implementations to be viewed and modified.
C++
242
star
21

DrSancov

DynamoRIO plugin to get ASAN and SanitizerCoverage compatible output for closed-source executables
C++
203
star
22

Hyntrospect

PowerShell
179
star
23

reil

C++
59
star
24

.allstar

1
star
25

.github

1
star