Overview
peafl64 is a static instrumentation tool for x64 PEs in Windows.
Static instrumentation is the practice of editing executable files and adding code to specific locations in them.
The instrumentation adds code to the start of every basic block in the binary, logging the execution flow in an AFL-compatible way.
It allows us to fuzz binaries in Usermode (using WinAFL) and Kernelmode (using kAFL) without access to their source code.
There are other ways of fuzzing Windows binaries; in this project we chose to focus on static instrumentation because it's the fastest method.
This project builds on the pe-afl tool by wmliang, with added x64 support.
Features
- Full Windows x64 binaries support
- High performance
- Supports instrumenting with process ID or thread ID filtering
- Handles relocations, exception tables, relative instructions, jump tables, imports, exports and more
- Compatible with WinAFL (headers included) and kAFL
Usage
IDA Analysis
The instrumentation script requires an IDA analysis output.
To create it, run the provided ida_dumper.py
script in IDA.
The script requires IDA 7+ and python3.8+.
Instrumentation
usage: pe_afl.py [-h] [-n] [-cb] [-tf] [-te THREAD_ENTRY] [-nt NTOSKRNL] [-e ENTRY] [-l ENLARGE] [-v] [-lf] pefile ida_dump
positional arguments:
pefile Target PE file for instrumentation
ida_dump dump.json from IDA (created by ida_dumper.py)
optional arguments:
-h, --help show this help message and exit
-n, --nop Instrument with NOPs for testing
-cb, --callback Instrument with a callback, which is in the helper driver that's written in C
-tf, --thread-filter Driver instrumentation that filters only on thread ID (must use "-te" with this option)
-te THREAD_ENTRY, --thread-entry THREAD_ENTRY
The address (RVA) of the thread's initialization function
-nt NTOSKRNL, --ntoskrnl NTOSKRNL
ntoskrnl.exe path for offset extraction (non-optional if instrumenting a driver)
-e ENTRY, --entry ENTRY
Inject code on entry point, ie. -e9090
-l ENLARGE, --enlarge ENLARGE
Enlarge factor for sections, default=4
-v, --verbose Print debug log
-lf, --logfile Print log to pe-afl64.log rather than stream
Instrumenting usermode binary with NOPs
PS pe-afl-64> python .\pe_afl.py -n C:\Work\cmd.exe C:\Work\cmd.exe.dump.json
[*] User-mode binary is being instrumented
[*] Single-thread instrument is on
[*] Preparing new sections
[*] Added section .text^
[*] Added section .cov
[*] Expanding relative jumps
[*] Expanded 3874 of 14353 branches
[*] Building address map
[*] Updating relative instructions
[*] Updating relocations...
[*] Updating Export table
[*] Updating load config
[*] Updating exception records
[*] Updating the PE headers
[*] Finalizing...
[*] Creating instrumented code
[*] Writing address mapping to C:\Work\cmd.exe.mapping.txt
[*] Updating jump tables
[*] Updated .text
...
[*] Removing temporary PE files
[*] Fixing PE checksum
[*] Instrumented binary saved to: C:\Work\cmd.instrumented.exe
Instrumenting kernelmode binary with process ID filtering
python .\pe_afl.py -l 6 -nt "C:\Work\ntoskrnl.exe" "C:\Work\driver.sys" "C:\Work\driver.sys.dump.json"
Instrumenting kernelmode binary with thread ID filtering and verbose output
python .\pe_afl.py -v -tf -te 0x40000 -l 6 -nt "C:\Work\ntoskrnl.exe" "C:\Work\ntoskrnl.exe" "C:\Work\ntoskrnl.exe.dump.json"
How to replace Windows drivers
First you'll need to check if your machine is booting using BIOS or UEFI.
For Hyper-V machines: Gen 1 machines are BIOS based, and the Gen 2 are UEFI based.
If your machine is BIOS based then you need to patch winload.exe
:
- Get a copy of winload.exe from your VM and find the function
ImgpValidateImageHash
in it - Patch the return value to always return 0 in rax. For example replace
mov eax, edi
withxor eax, eax
in the last code block of the function - Copy the patched winload to system32 folder and run
bcdedit /set path \Windows\system32\winload2.exe
If your machine is booted using UEFI then use EfiGuard util to patch winload.efi
.
Command cheatsheet if using Hyper-V Manager:
- Create a new hard drive for the Virtual Machine
- Use the supplied FAT.vhdx in the Tools folder (or create one yourself) containing the UefiShell+EfiGuard module with the new hard driver
- Change the machine's boot order, so the new hard drive is first in order
- After boot, while in the UefiShell run the following commands:
FS1:
cd EFI/Boot
Load EfiGuard.Dxe.efi
FS0:
\EFI\Boot\bootx64.efi
Then instrument your driver of choice. To load an instrumented driver on a Windows machine it must be signed, and a self-signing certificate is enough to pass the OS's demands:
# In elevated powershell terminal
$c = New-SelfSignedCertificate -Type CodeSigningCert -KeyUsage DigitalSignature -Subject 'CN=Microsoft Windows, O=Microsoft Corporation, L=Redmond, S=Washington, C=US'
Set-AuthenticodeSignature .\driver.instrumented.sys -Certificate $c -Force
If the driver you instrumented is already used by the system, use these commands (as admin) to replace the driver's file:
set NAME=mydriver.sys
icacls %NAME% /save C:\windows\temp\%NAME%.icacls
takeown /F %NAME%
icacls %NAME% /grant Everyone:F
move %NAME% %NAME%.bak
move instrumented_driver.sys %NAME%
icacls . /restore C:\windows\temp\%NAME%.icacls
Then run:
bcdedit /set recoveryenabled no
bcdedit /set nointegritychecks on
shutdown -t 0 -r
WinAFL Fuzzing
Integration with WinAFL is by compiling a harness using the provided headers.
Alongside the headers there's example.c
, which is a sample program that shows how to use them.
The provided headers are a slight modification of the headers that are already provided by WinAFL to integrate with
another Static Binary Instrumentation tool called Syzygy.
kAFL Fuzzing
The way we integrate with kAFL is pretty simple.
Normally, a kAFL harness is running on a virtual machine and talks to the fuzzer's frontend using special "hypercalls".
These hypercalls tell the fuzzer to do many things, among them is to load coverage data from IntelPT and parse it as an AFL bitmap.
Because peafl64 makes IntelPT tracing obsolete, we must prepare a way to transmit the coverage data to the fuzzer.
Therefore, we expanded qemu and kvm with "hypercalls" that allow the (usermode) harness that runs in a VM to send the coverage data it collected using the helper driver.
ESXi Setup
This is specifically about the setup on ESXi but should be relevant for other virtualization platforms like AWS.
The setup is pretty simple:
- Create an Ubuntu machine
- Make sure "Expose hardware assisted virtualization to the guest OS" is enabled in the machine's CPU configuration
- Clone the sbi_kAFL repo
- Install kAFL normally as instructed in the kAFL repo, but instead of
install.sh qemu
step, runinstall.sh qemu_sbi
To fuzz using kAFL and peafl64, we need to setup a fuzzing machine:
- Compile the helper driver and sign it
- Compile a harness using the headers provided with our kAFL fork
- On the VM - load the helper driver
- Run the kAFL's loader
Other than that, fuzzing with our kAFL fork is the same as normal fuzzing with kAFL.
Execution Flow
Steps Overview:
- Determine insertion points for instrumentation code
- Locate instructions and structures that will need adjustments
- Insert the instrumentation code to the requested functions
- Adjust:
- Relative instructions
- Jump tables
- Exception handlers
- PE headers
- Various PE configurations (load config)
- Rebuilding the PE with the instrumented sections
First, we use IDA to find and analyze the instructions and locations of interest.
The analysis creates a dump.json
file that contains all that information in a json format.
instrument.py
contains the instrumentation logic, and the flow itself is outlined in the process_pe
function.
To instrument the binary, we duplicate its executable sections, where all the instrumented code will be found.
Next, we process all the relative instructions and determine how and if we need to handle them.
Suppose we have a short jmp from address X to address Y. We insert instrumentation code between X and Y, and now the target is at address Z (Z = Y + len(instrumentation)
).
That means we have to modify the jump.
If address Z is now located outside the range of short jmps - we need to transform the jump from short
to far
. (instrument.py:expand_relative_instructions)
For example:
short jmp 0x57
(eb 55) ->near jmp 0x604
(e9 ff 05 00 00)
That's also the case for instructions like call
, loop
and conditional jumps.
Another common case that needs handling is rip-relative instructions that were introduced in x64 assembly:
call [rip+0x1000]
We update the following things to point to the instrumented code: relocation table, export table, load configuration, the TLS directory and exception records.
The updates are done by updating all addresses from the original sections to their instrumented counterparts. (instrument.py:update_addr)
The headers are updated to reflect the changes to the PE structure - added sections, changed entrypoint and PE size. (instrument.py:update_pe_headers)
Additionally, peafl64 is able to instrument the Windows Kernel. To achieve that, it parses and updated the Dynamic Value Relocation Table (DVRT) and the SSDT, and handles PatchGuard specifics.
The DVRT is a compiler generated table which describes the locations of addresses that need to be changed when loading the PE.
It's used to improve KASLR and helps mitigate the Spectre vulnerability (1, 2, 3).
The DVRT is parsed using a set of classes that mimic the structure of the table (drt.py; instrument.py:get_updated_dynamic_relocs)
Handling and updating the SSDT wasn't as straight-forward.
Its address is not exported, so we had to either rely on symbols or on heuristics to locate it.
Our solution uses a heuristic approach, using NtWaitForSingleObject
as a constant marker for all Windows 10 versions and finding the address of the SSDT relatively to it. (instrument.py:ntoskrnl_update_KiServiceTable)
This approach works for all Windows 10 Kernels, but will need to be adjusted for other kernel versions.
TODO
support x64- New exception handlers (cxx4)
- Improve IDA dumper performance
- Fully parse LOAD_CONFIG struct
- Add tests
- Support more Windows Kernel versions out of the box
- Integrate with Nyx (the new kAFL)