• Stars
    star
    1,290
  • Rank 35,209 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Windows kernel and user mode emulation.

Speakeasy

Speakeasy is a portable, modular, binary emulator designed to emulate Windows kernel and user mode malware.

Check out the overview in the first Speakeasy blog post.

Instead of attempting to perform dynamic analysis using an entire virtualized operating system, Speakeasy will emulate specific components of Windows. Specifically, by emulating operating system APIs, objects, running processes/threads, filesystems, and networks it should be possible to present an environment where samples can fully "execute". Samples can be easily emulated in a container or in cloud services which allow for great scalability of many samples to be simultaneously analyzed. Currently, Speakeasy supports both user mode and kernel mode Windows applications.

Before emulating, entry points are identified within the binary. For example, exported functions are all identified and emulated sequentially. Additionally, dynamic entry points (e.g. new threads, registered callbacks, IRP handlers) that are discovered at runtime are also emulated. The goal here is to have as much code coverage as possible during emulation. Events are logged on a per-entry-point basis so that functionality can be attributed to specific functions or exports.

Speakeasy is currently written entirely in Python 3 and relies on the Unicorn emulation engine in order to emulate CPU instructions. The CPU emulation engine can be swapped out and there are plans to support other engines in the future.

APIs are emulated in Python code in order to handle their expected inputs and outputs in order to keep malware on their "happy path". These APIs and their structure should be consistent with the API documentation provided by Microsoft.


Installation

Speakeasy can be executed in a docker container, as a stand-alone script, or in cloud services. The easiest method of installation is by first installing the required package dependencies, and then running the included setup.py script (replace "python3" with your current Python3 interpreter):

cd <repo_base_dir>
python3 -m pip install -r requirements.txt
python3 setup.py install

A docker file is also included in order to build a docker image, however, Speakeasy's dependencies can be installed on the local system and run from Python directly.


Running within a docker container

The included Dockerfile can be used to generate a docker image.


Building the docker image

  1. Build the Docker image; the following commands will create a container with the tag named "my_tag":
cd <repo_base_dir>
docker build -t "my_tag" .
  1. Run the Docker image and create a local volume in /sandbox:
docker run -v <path_containing_malware>:/sandbox -it "my_tag"

Usage


As a library

Speakeasy can be imported and used as a general purpose Windows emulation library. The main public interface named Speakeasy should be used when interacting with the framework. The lower level emulator objects can also be used, however their interfaces may change in the future and may lack documentation.

Below is a quick example of how to emulate a Windows DLL:

    import speakeasy

    # Get a speakeasy object
    se = speakeasy.Speakeasy()

    # Load a DLL into the emulation space
    module = se.load_module("myfile.dll")

    # Emulate the DLL's entry point (i.e. DllMain)
    se.run_module(module)

    # Set up some args for the export
    arg0 = 0x0
    arg1 = 0x1
    # Walk the DLLs exports
    for exp in module.get_exports():
        if exp.name == 'myexport':
            # Call an export named 'myexport' and emulate it
            se.call(exp.address, [arg0, arg1])

    # Get the emulation report
    report = se.get_report()

    # Do something with the report; parse it or save it off for post-processing

For more examples, see the examples directory.


As a standalone command line tool

For users who don't wish to programatically interact with the speakeasy framework as a library, a standalone script is provided to automatically emulate Windows binaries. Speakeasy can be invoked by running the command speakeasy. This command will parse a specified PE and invoke the appropriate emulator (kernel mode or user mode). The script's parameters are shown below.

usage: speakeasy [-h] [-t TARGET] [-o OUTPUT] [-p [PARAMS ...]] [-c CONFIG] [-m] [-r] [--raw_offset RAW_OFFSET]
                        [-a ARCH] [-d DUMP_PATH] [-q TIMEOUT] [-z DROP_FILES_PATH] [-l MODULE_DIR] [-k] [--no-mp]

Emulate a Windows binary with speakeasy

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET, --target TARGET
                        Path to input file to emulate
  -o OUTPUT, --output OUTPUT
                        Path to output file to save report
  -p [PARAMS ...], --params [PARAMS ...]
                        Commandline parameters to supply to emulated process (e.g. main(argv))
  -c CONFIG, --config CONFIG
                        Path to emulator config file
  -m, --mem-tracing     Enables memory tracing. This will log all memory access by the sample but will impact speed
  -r, --raw             Attempt to emulate file as-is with no parsing (e.g. shellcode)
  --raw_offset RAW_OFFSET
                        When in raw mode, offset (hex) to start emulating
  -a ARCH, --arch ARCH  Force architecture to use during emulation (for multi-architecture files or shellcode). Supported
                        archs: [ x86 | amd64 ]
  -d DUMP_PATH, --dump DUMP_PATH
                        Path to store compressed memory dump package
  -q TIMEOUT, --timeout TIMEOUT
                        Emulation timeout in seconds (default 60 sec)
  -z DROP_FILES_PATH, --dropped-files DROP_FILES_PATH
                        Path to store files created during emulation
  -l MODULE_DIR, --module-dir MODULE_DIR
                        Path to directory containing loadable PE modules. When modules are parsed or loaded by samples, PEs
                        from this directory will be loaded into the emulated address space
  -k, --emulate-children
                        Emulate any processes created with the CreateProcess APIs after the input file finishes emulating
  --no-mp               Run emulation in the current process to assist instead of a child process. Useful when
                        debuggingspeakeasy itself (using pdb.set_trace()).

Examples

Emulating a Windows driver:

user@mybox:~/speakeasy$ speakeasy -t ~/drivers/MyDriver.sys

Emulating 32-bit Windows shellcode:

user@mybox:~/speakeasy$ speakeasy -t ~/sc.bin  -r -a x86

Emulating 64-bit Windows shellcode and create a full memory dump:

user@mybox:~/speakeasy$ speakeasy -t ~/sc.bin  -r -a x64 -d memdump.zip

Configuration

Speakeasy uses configuration files that describe the environment that is presented to the emulated binaries. For a full description of these fields see the README here.


Memory Management

Speakeasy implements a lightweight memory manager on top of the emulator engine’s memory management. Each chunk of memory allocated by malware is tracked and tagged so that meaningful memory dumps can be acquired. Being able to attribute activity to specific chunks of memory can prove to be extremely useful for analysts. Logging memory reads and writes to sensitive data structures can reveal the true intent of malware not revealed by API call logging which is particularly useful for samples such as rootkits.


Speed

Because Speakeasy is written in Python, speed is an obvious concern. Transitioning between native code and Python is extremely expensive and should be done as little as possible. Therefore, the goal is to only execute Python code when it is absolutely necessary. By default, the only events handled in Python are memory access exceptions or Windows API calls. In order to catch Windows API calls and emulate them in Python, import tables are doped with invalid memory addresses so that Python code is only executed when import tables are accessed. Similar techniques are used for when shellcode accesses the export tables of DLLs loaded within the emulated address space of shellcode. By executing as little Python code as possible, reasonable speeds can be achieved while still allowing users to rapidly develop capabilities for the framework.


Limitations

Since we do not rely on a physical OS to handle API calls, object and memory allocation, and I/O operations, these responsibilities fall to the emulator. Upon emulating multiple samples, users are likely to encounter samples that do not fully emulate. This can most likely be attributed to missing API handlers, specific OS implementation details, or environmental factors. For more details see doc/limitations.


Module export parsing

Many malware samples such as shellcode will attempt to manually parse the export tables of PE modules in order resolve API function pointers. An attempt is made to make "decoy" export tables using the emulated function names currently supported but this may not be enough for some samples. The configuration files support two fields named module_directory_x86 and module_directory_x64. These fields are directories that can contain DLLs or other modules that are loaded into the virtual address space of the emulated sample. There is also a command line option (-l) that can specify this directory at runtime. This can be useful for samples that do deep parsing of PE modules that are expected to be loaded within memory.


Adding API handlers

Like most emulators, API calls made to the OS are handled by the framework. Emulated API handlers can be added by simply defining a function with the correct name in its corresponding emulated module. Depending on the outputs expected by the API, it may be sufficient enough to simply return a success code. The argument count must be specified in order for the stack to be cleaned up correctly. If no calling convention is specified, stdcall is assumed. The argument list is passed to the emulated function as raw integers.

Below is an example of an API handler for the HeapAlloc function in the kernel32 module.

    @apihook('HeapAlloc', argc=3)
    def HeapAlloc(self, emu, argv, ctx={}):
        '''
        DECLSPEC_ALLOCATOR LPVOID HeapAlloc(
          HANDLE hHeap,
          DWORD  dwFlags,
          SIZE_T dwBytes
        );
        '''

        hHeap, dwFlags, dwBytes = argv

        chunk = self.heap_alloc(dwBytes, heap='HeapAlloc')
        if chunk:
            emu.set_last_error(windefs.ERROR_SUCCESS)

        return chunk

Further information

More Repositories

1

commando-vm

Complete Mandiant Offensive VM (Commando VM), a fully customizable Windows-based pentesting virtual machine distribution. [email protected]
PowerShell
6,656
star
2

flare-vm

A collection of software installations scripts for Windows systems that allows you to easily setup and maintain a reverse engineering environment on a VM.
PowerShell
5,733
star
3

capa

The FLARE team's open-source tool to identify capabilities in executable files.
Python
3,911
star
4

flare-floss

FLARE Obfuscated String Solver - Automatically extract obfuscated strings from malware.
Python
3,036
star
5

red_team_tool_countermeasures

YARA
2,629
star
6

flare-ida

IDA Pro utilities from FLARE team
Python
2,031
star
7

flare-fakenet-ng

FakeNet-NG - Next Generation Dynamic Network Analysis Tool
Python
1,677
star
8

SharPersist

C#
1,213
star
9

ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.
PowerShell
1,184
star
10

gocrack

GoCrack is a management frontend for password cracking tools written in Go
Go
1,101
star
11

flare-emu

Python
735
star
12

SilkETW

C#
641
star
13

stringsifter

A machine learning tool that ranks strings based on their relevance for malware analysis.
Python
636
star
14

Mandiant-Azure-AD-Investigator

PowerShell
593
star
15

Azure_Workshop

HCL
572
star
16

sunburst_countermeasures

YARA
560
star
17

Ghidrathon

The FLARE team's open-source extension to add Python 3 scripting to Ghidra.
Java
556
star
18

ReelPhish

Python
493
star
19

capa-rules

Standard collection of rules for capa: the tool for enumerating the capabilities of programs
489
star
20

iocs

FireEye Publicly Shared Indicators of Compromise (IOCs)
458
star
21

DueDLLigence

C#
450
star
22

FIDL

A sane API for IDA Pro's decompiler. Useful for malware RE and vulnerability research
Python
431
star
23

flare-wmi

C++
412
star
24

GoReSym

Go symbol recovery tool
Go
379
star
25

rvmi

rVMI - A New Paradigm For Full System Analysis
C
352
star
26

PwnAuth

Python
347
star
27

idawasm

IDA Pro loader and processor modules for WebAssembly
Python
332
star
28

ADFSpoof

Python
318
star
29

SimplifyGraph

IDA Pro plugin to assist with complex graphs
C++
303
star
30

STrace

A DTrace on Windows Reimplementation
C++
299
star
31

ShimCacheParser

Python
258
star
32

OfficePurge

C#
256
star
33

msi-search

C
215
star
34

ioc_writer

Python
195
star
35

macos-UnifiedLogs

Rust
192
star
36

GeoLogonalyzer

GeoLogonalyzer is a utility to analyze remote access logs for anomalies such as travel feasibility and data center sources.
Python
191
star
37

flare-kscldr

FLARE Kernel Shellcode Loader
C
175
star
38

Vulnerability-Disclosures

C++
166
star
39

flare-qdb

Command-line and Python debugger for instrumenting and modifying native software behavior on Windows and Linux.
Python
161
star
40

flare-dbg

flare-dbg is a project meant to aid malware reverse engineers in rapidly developing debugger scripts.
Python
149
star
41

thiri-notebook

The Threat Hunting In Rapid Iterations (THIRI) Jupyter notebook is designed as a research aide to let you rapidly prototype threat hunting rules.
Python
146
star
42

route-sixty-sink

Link sources to sinks in C# applications.
C#
137
star
43

heyserial

Programmatically create hunting rules for deserialization exploitation with multiple keywords, gadget chains, object types, encodings, and rule types
YARA
130
star
44

dncil

The FLARE team's open-source library to disassemble Common Intermediate Language (CIL) instructions.
Python
124
star
45

flashmingo

Automatic analysis of SWF files based on some heuristics. Extensible via plugins.
Python
118
star
46

VM-Packages

PowerShell
117
star
47

Reversing

111
star
48

ioc-scanner-CVE-2019-19781

Indicator of Compromise Scanner for CVE-2019-19781
Shell
91
star
49

flare-bytecode_graph

Python
82
star
50

gocrack-ui

The User Interface for GoCrack
Vue
81
star
51

Volatility-Plugins

Python
80
star
52

unicorn-libemu-shim

libemu shim layer and win32 environment for Unicorn Engine
C++
70
star
53

citrix-ioc-scanner-cve-2023-3519

Shell
61
star
54

AuditParser

AuditParser
Python
56
star
55

remote_lookup

Resolves DLL API entrypoints for a process w/ remote query capabilities.
Visual Basic
54
star
56

synfulknock

Lua
48
star
57

SSSDKCMExtractor

Python
46
star
58

jitm

JITM is an automated tool to bypass the JIT Hooking protection on a .NET sample.
C++
43
star
59

goauditparser

Go
39
star
60

tf_rl_tutorial

Tutorial: Statistical Relational Learning with Google TensorFlow
Jupyter Notebook
39
star
61

macOS-tools

Python
38
star
62

apooxml

Generate YARA rules for OOXML documents.
Python
38
star
63

gootloader

Collection of scripts used to deobfuscate GOOTLOADER malware samples.
Python
36
star
64

capa-testfiles

Data to test capa's code and rules.
Max
35
star
65

pycommands

PyCommand Scripts for Immunity Debugger
Python
35
star
66

vocab_scraper

Vocabulary Scraper script used in FLARE's analysis of Russian-language Carbanak source code
Python
35
star
67

ARDvark

ARDvark parses the Apple Remote Desktop (ARD) files to pull out application usage, user activity, and filesystem listings.
Python
34
star
68

rvmi-rekall

Rekall Forensics and Incident Response Framework with rVMI extensions
Python
32
star
69

gocat

Provides access to libhashcat
Go
29
star
70

ics_mem_collect

Python
26
star
71

rvmi-qemu

QEMU with rVMI extensions
C
26
star
72

IDA_Pro_VoiceAttack_profile

Python
25
star
73

pulsesecure_exploitation_countermeasures

YARA
24
star
74

win10_auto

Python
23
star
75

rvmi-kvm

Linux-KVM with rVMI extensions
C
23
star
76

pivy-report

Poison Ivy Appendix/Extras
17
star
77

siglib

Python
15
star
78

vbScript_deobfuscator

Help deobfuscate VBScript
VBA
13
star
79

flare-gsoc-2023

Supporting resources and documentation for FLARE @ Google Summer of Code 2023
13
star
80

DFUR-Splunk-App

The "DFUR" Splunk application and data that was presented at the 2020 SANS DFIR Summit.
13
star
81

rpdebug_qnx

Python
11
star
82

mandiant_managed_hunting

Azure Deployment Templates for Mandiant Managed Huning
9
star
83

flare-floss-testfiles

Resources for testing FLOSS by the FLARE team.
C
6
star