• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Rust
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High performance fuzzing using riscv to x86 binary translations and modern fuzzing techniques

SFUZZ

Start date: Dec, 2021

This is a coverage-guided, emulation based greybox fuzzer that makes use of a custom Just-In-Time compiler to achieve near-native performance. It works by lifting RISC-V elf binaries to an intermediate representation before JIT compiling them to x86 during execution. During JIT compilation the code is instrumented to enable fuzzing-improvements such as coverage tracking, asan, cmpcov, or snapshot-based fuzzing.


Features

  • Multi-threaded, supporting an arbitrary amount of threads and scaling almost linearly
  • Custom JIT compiler for high performance and more importantly customizability that is harder to achieve with other solutions such as qemu
  • Custom memory management unit to once again allow high customization and highly beneficial features such as byte-level permission checks and dirty-bit based emulator resets. Additionally hooks to allow for safe usage of heap-routines are implemented.
  • Virtualized file management to allow easy in memory fuzzing
  • Snapshot based fuzzing, so a target's memory/register state can be snapshotted during execution to base all future fuzz cases off of this baseline
  • Edge-level coverage tracking, and coverage guided fuzzing based on this feedback
  • Various mutators, crash deduplication, and a simple seed scheduling algorithm

Description

The objective of this project is to highlight the benefits of using an emulated environment for fuzzing. Many previous fuzzers based on emulation exist, but they all almost exclusively use the qemu emulation engine for the underlying emulation. While this engine does have a fairly mature just-in-time compiler and generates very good code, it is not designed for fuzzing. During fuzzing, we intend to run the same process thousands of times per second. This makes room for specialized optimizations that qemu does not make strong use of such as reusing the same memory space for each process run and only resetting a limited amount of memory via dirty bit mechanics.

In many ways, this is more of a proof-of-concept that I wanted to work on to learn about compiler internals, and have an emulation-based playground to play around with various fuzzing techniques such as different coverage metrics, seed schedulers, and snapshot-based fuzzing. With more JIT optimizations and most importantly, extensions to include more popular architectures such as mips or arm this could however certainly be used to efficiently fuzz closed source code that cannot simply be instrumented through recompilation.

Given the testing I have done so far, sfuzz has significantly less overhead than many other popular fuzzers, which results in very fast performance, especially for small fuzz cases.

More details on the features/choices made for this fuzzer are listed in the accompanying blogpost (https://seal9055.com/blog/fuzzing/sfuzz) and the documentation files listed below:

Usage

This entire fuzzer is written in rust, so after cloning the repository, just run cargo build --release to compile.

Since the fuzzer currently only supports RISC-V, the target needs to be compiled to RISC-V using the below toolchain (or a similar one). Alternatively if you already have a RISC-V binary that will work perfectly fine too.

Once this is set up, just create input/output directories, add some initial seed files to the input directory and start up the fuzzer.

./sfuzz -i in -o out -- ./test_cases/simple_test @@

Additional flags can be passed in via commandline options to specify the number of threads, enable snapshot fuzzing, add a dictionary to the mutator, etc. The additional options can be listed by running sfuzz with the -h flag.

If you wish to test the fuzzer against some targets of varying complexity, the progrem_generator at tools/program_generator can be used to automatically generate programs of varying complexity. Note that you will require a RISC-V toolchain to then compile the target.

Riscv toolchain to compile binaries for the fuzzer

This sets up a toolchain to compile riscv binaries that can be loaded/used by this project.

Riscv compiler/tooling:
    sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev \
    libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev \
    libexpat-dev
    git clone https://github.com/riscv/riscv-gnu-toolchain && cd riscv-gnu-toolchain
    ./configure --prefix=/opt/riscv --with-arch=rv64i
    sudo make

Debugger:
    gdb-multiarch

TODO

This list represents a set of features that I plan on implementing in the future.

  • Working Memory management unit
  • JIT Compiler
  • Virtualized files for in-memory fuzzing
  • Byte level permission checks + hooked/safe allocators
  • Track edge level coverage
  • Persistent mode to fuzz in small loops around target functions
  • Crash deduping / unique crashes
  • Update mutators to include more options
  • Seed Scheduling
  • CmpCov to get past magic values and checksums
  • Add some tooling around the fuzzer
  • Proper benchmarking
  • Implement RISC-V M & A extensions, so that the JIT can use glibc instead of newlib
  • Replace assembler to improve compilation speed
  • Support more architectures (eg. mips, arm)
  • JIT optimizations, and another attempt at register allocation

References