CompareCoverage
CompareCoverage (CmpCov in short) is a simple instrumentation module for C/C++ programs and libraries, which extracts information about data comparisons taking place in the code at run time, and saves it to disk in the form of standard .sancov
files. It is based on the SanitizerCoverage instrumentation available in the clang
compiler, which itself is tightly related to AddressSanitizer. Specifically, the library implements the instrumentation callbacks defined by the Tracing data flow feature of SanitizerCoverage.
The tool works similarly to how "regular" code coverage information is saved by SanitizerCoverage when the target is compiled with the -fsanitize-coverage=trace-pc-guard
flag. The output generated by this tool is complimentary to the basic edge-based coverage, and is meant to be used as a sub-instruction profiling instrument, which makes it possible for fuzzers to progress through 16/32/64-bit constants and textual strings expected in the input stream. For reference, see e.g.:
- http://taviso.decsystem.org/making_software_dumber.pdf
- https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/
In various forms, similar instrumentation is employed in the afl, libFuzzer and honggfuzz fuzzers. CompareCoverage may prove useful when coupled with custom, dedicated fuzzers outside of the above list.
Building
Makefiles for both Windows and GNU/Linux are provided. The end result is a static library which can be linked the your target software.
Note: The library is written in C++. When linking with software written in C, it might be necessary to add an extra -lstdc++
flag to the linker command line.
Linux
On Linux, libcmpcov.a
is generated as shown below:
$ make -f Makefile.linux
clang++ -c -o cmpcov.o cmpcov.cc -O2 -fPIC
clang++ -c -o common.o common.cc -O2 -fPIC
clang++ -c -o modules.o modules.cc -O2 -fPIC
clang++ -c -o tokenizer.o tokenizer.cc -O2 -fPIC
clang++ -c -o traces.o traces.cc -O2 -fPIC
ar cr libcmpcov.a cmpcov.o common.o modules.o tokenizer.o traces.o
$
To build a program with AddressSanitizer, SanitizerCoverage and CompareCoverage, add the -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
flags to the compilation step (e.g. CFLAGS
or CXXFLAGS
), and -fsanitize=address -Wl,--whole-archive -L/cmpcov/directory/path -lcmpcov -Wl,--no-whole-archive
to the linking step (e.g. LDFLAGS
):
$ clang++ -c test.cc -o test.o -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
$ clang++ test.o -o test -fsanitize=address -Wl,--whole-archive -L../cmpcov -lcmpcov -Wl,--no-whole-archive
$
Windows
Compilation of cmpcov.lib
is achieved as follows:
>make -f Makefile.win
clang-cl -c -o cmpcov.o cmpcov.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o common.o common.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o modules.o modules.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o tokenizer.o tokenizer.cc -O2 -Wno-deprecated-declarations
clang-cl -c -o traces.o traces.cc -O2 -Wno-deprecated-declarations
llvm-lib /out:cmpcov.lib cmpcov.o common.o modules.o tokenizer.o traces.o
>
To build the target software with the complete instrumentation, add the -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
flags to the compiler command line, and -fsanitize=address -L/cmpcov/directory/path -lcmpcov
in the linking stage, e.g.:
>clang++ -c test.cc -o test.o -fsanitize=address -fsanitize-coverage=trace-pc-guard,trace-cmp
>clang++ test.o -o test.exe -fsanitize=address -lcmpcov -L../cmpcov
>
Usage
CmpCov is generally controlled by the same ASAN_OPTIONS
environment variable as SanitizerCoverage, and it currently supports two flags: coverage
and coverage_dir
. For example, to enable dumping the coverage information to disk, and have it saved in the logs
directory, you can start your tested program as follows:
$ ASAN_OPTIONS=coverage=1,coverage_dir=logs ./test <<< "The quick"
CmpSanitizerCoverage: logs/cmp.test.75048.sancov: 9 PCs written
SanitizerCoverage: logs/test.75048.sancov: 2 PCs written
$ ls logs/
cmp.test.75048.sancov test.75048.sancov
$
The test program above expected the "The quick brown fox ..." string on standard input, and because we provided a few of the first valid bytes, some comparison traces were generated and saved in an extra log file with a name starting with cmp
. The more matching bytes there are at the beginning of a memory buffer or variable, the more traces are generated. The format of the output files is equivalent to that of typical .sancov
files, and consists of a 64-bit header denoting the width of subsequent items (32/64-bit), followed by the traces themselves:
$ hexdump -C logs/test.75048.sancov
00000000 64 ff ff ff ff ff bf c0 81 e1 52 00 00 00 00 00 |d.........R.....|
00000010 7a e2 52 00 00 00 00 00 |z.R.....|
00000018
$ hexdump -C logs/cmp.test.75048.sancov
00000000 64 ff ff ff ff ff bf c0 43 e2 12 00 00 00 01 f0 |d.......C.......|
00000010 43 e2 12 00 00 00 02 f0 43 e2 12 00 00 00 03 f0 |C.......C.......|
00000020 43 e2 12 00 00 00 04 f0 43 e2 12 00 00 00 05 f0 |C.......C.......|
00000030 43 e2 12 00 00 00 06 f0 43 e2 12 00 00 00 07 f0 |C.......C.......|
00000040 43 e2 12 00 00 00 08 f0 43 e2 12 00 00 00 09 f0 |C.......C.......|
00000050
$
In 64-bit mode, the lower 48 bits contain the instruction offset within the given module, while the upper 16 bits encode information about the comparison (type, switch/case index, number of matching bytes). In 32-bit mode, it is the same value, but hashed and truncated to 32 bits. For more details, please refer to the source code.
Additional TRACE_NONCONST_CMP
and TRACE_MEMORY_CMP
environment variables are available to control the instrumentation of non-const comparisons (off by default), and the instrumentation of memory/string functions (on by default).
The instrumentation was specifically designed to be compatible with the corpus management algorithm described in Effective File Format Fuzzing, but should work well with any other approach to corpus distillation.
Example
To better illustrate the capabilities of CmpCov and tracing data flow in general, we developed a demonstration program demo.cc, which expects the following data on standard input:
- A "The quick brown fox " string checked with
memcmp
, - A "jumps over " string checked with
strncmp
, - A "the lazy dog" string checked with
strcmp
, - A
0xCAFEBABECAFEBABE
64-bit constant, - A
0xDEADC0DE
32-bit constant, - A
0xBEEF
16-bit constant.
Furthermore, we built a trivial fuzzer, which replaces subsequent bytes in the input stream with random values, until the coverage grows. A conventional fuzzer without any insight into the comparisons taking place wouldn't be able to progress through the checks. With CmpCov, all 57 bytes of input were successfully discovered in less than 4 minutes in our test run:
$ python fuzzer.py ./demo
---------- Initial coverage (2019-02-05 16:58:10, 2 traces) ----------
00000000: 26 3d 77 b7 bc bf 82 41 b4 a6 f2 c0 57 57 54 18 &=w....A....WWT.
00000010: 0c 29 01 72 e5 d4 a6 c0 ce bd b9 02 6c 87 24 48 .).r........l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d L..I.!.*.
[...]
---------- New coverage (2019-02-05 16:59:10, 24 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a d4 a6 c0 ce bd b9 02 6c 87 24 48 fox j.......l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d L..I.!.*.
---------- New coverage (2019-02-05 16:59:14, 25 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a 75 a6 c0 ce bd b9 02 6c 87 24 48 fox ju......l.$H
00000020: 7b 7d bb 34 08 60 5f 3a 0a 9a 06 ab f4 71 98 14 {}.4.`_:.....q..
00000030: 4c 84 e6 49 93 21 b0 2a 0d L..I.!.*.
[...]
---------- New coverage (2019-02-05 17:01:34, 65 traces) ----------
00000000: 54 68 65 20 71 75 69 63 6b 20 62 72 6f 77 6e 20 The quick brown
00000010: 66 6f 78 20 6a 75 6d 70 73 20 6f 76 65 72 20 74 fox jumps over t
00000020: 68 65 20 6c 61 7a 79 20 64 6f 67 be ba fe ca be he lazy dog.....
00000030: ba fe ca de c0 ad de ef be .........
$
Disclaimer
This is not an official Google product.