• Stars
    star
    296
  • Rank 139,643 (Top 3 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created almost 7 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Program synthesis based deobfuscation framework for the USENIX 2017 paper "Syntia: Synthesizing the Semantics of Obfuscated Code"

Syntia is a program synthesis based framework for deobfuscation. It uses instruction traces as an blackbox oracle to produce random input and output pairs. From these I/O pairs, the synthesizer learns the code's underlying semantic.

The framework is based on our paper:

@inproceedings{blazytko2017syntia,
    author = {Blazytko, Tim and Contag, Moritz and Aschermann, Cornelius and Holz, Thorsten},
    title = {{Syntia: Synthesizing the Semantics of Obfuscated Code}},
    year = {2017},
    booktitle = {USENIX Security Symposium} 
}

Usage

The scripts demonstrate the usage of the framework.

Symbolic execution

To symbolically execute an instruction trace of an obfuscated expressions, use

python2 scripts/symbolic_execution.py samples/tigress_mba_trace.bin x86_64

In this example, the expression is obfuscated via Mixed Boolean-Arithmetic (MBA). The final result is stored in EAX.

Random Sampling

random_sampling.py generates random I/O pairs for a piece of code. Its output is a JSON file. To sample 20 times, use

python2 scripts/random_sampling.py samples/tigress_mba_trace.bin x86_64 20 mba_sampling.json

It can be specified if memory and/or register locations are inputs/outputs.

Program Synthesis for Obfuscated Code

sample_synthesis uses the I/O samples and synthesizes the semantics of each input. It is possible to synthesize only specific outputs (e.g., EAX):

{
 "output": {
     "name": "EAX", 
     "number": 0, 
     "size": 32
 }, 
 "top_non_terminal": {
     "expression": {
         "infix": "((u32 * u32) + (u32 * 1))"
     }, 
     "reward": 1.0
 }, 
 "top_terminal": {
     "expression": {
         "infix": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))"
     }, 
     "reward": 1.0
 }, 
 "successful": "yes", 
 "result": {
     "final_expression": {
         "infix": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))", 
         "simplified": "((mem_0x2 * mem_0x0) + (mem_0x4 * 1))"
     }
 }
}

The MBA-obfuscated expressions is equivalent to (mem_0x2 * mem_0x0) + mem_0x4, where mem_i corresponds to the i-th memory read.

Manual I/O Generation

If random sampling does not work, I/O pairs can be crafted with other methods, e.g., by changing and observing values in a debugger. We define each input and output as follows:

{
    "inputs": {
        "0": {
            "location": "mem0", 
            "size": "0x4"
        }, 
        "1": {
            "location": "mem1", 
            "size": "0x4"
        }
    },
    "outputs": {
        "0": {
            "location": "EAX", 
            "size": "0x4"
        }
    }, 
    "samples": [["0x2","0x6", "0xFFFFFFF9"],
                ["0x14e","0x213","0xFFFFFc2d"],
                ["0x3ed","0x2710","0xFFFFFBC8"]
                ]
}

Each list in samples defines the observed I/O pairs in one sampling step. Before synthesis, we use the script transform_manual_sampling_io_pairs.py to transform it into the same output form as the results of random_sampling.py.

python2 scripts/transform_manual_sampling_io_pairs.py manually_crafted.json sampling.json

The, we can synthesize it as usual and obtain

{
    "0": {
        "output": {
            "name": "EAX", 
            "number": 0, 
            "size": 32
        }, 
        "top_non_terminal": {
            "expression": {
                "infix": "(~ ((u32 + u32) ^ (u32 & u32)))"
            }, 
            "reward": 1.0
        }, 
        "top_terminal": {
            "expression": {
                "infix": "(~ ((mem0 + mem0) ^ (mem0 & mem0)))"
            }, 
            "reward": 1.0
        }, 
        "successful": "yes", 
        "result": {
            "final_expression": {
                "infix": "(~ ((mem0 + mem0) ^ (mem0 & mem0)))", 
                "simplified": "~(2*mem0 ^ mem0)"
            }
        }
    }
}

General Program Synthesis

mcts_synthesis_multi_core.py shows a basic usage of the synthesis algorithm. It can be used to test the synthesis of different expressions (which can be defined in oracle). Furthermore, it allows to test the synthesis behavior for different configuration parameters.

Structure

Syntia's code is structured in three parts: symbolic execution of obfuscated code, generating I/O pairs from binary code and the program synthesizer.

symbolic_execution

A wrapper around Miasm's symbolic execution engine. We use it to symbolically execute pieces of obfuscated code.

kadabra

Kadabra is our a blanked execution framework which is built on top of Unicorn Engine. Besides others, it supports instruction tracing, enforcing execution paths and tracing memory modifications.

assembly_oracle

The assembly oracle utilizes binary code as a black box and generates I/O pairs for the synthesizer. It is built upon Kadabra.

mcts

It is the the core of Syntia: Monte Carlo Tree Search based program synthesis. Given I/O pairs from the assembly oracle, the synthesizer finds semantically equivalent non-obfuscated code.

utils

Provides basic functionality that is used across the different subprojects. Furthermore, it contains some code that illustrates the parsing and usage of the random sampling results for program synthesis. .....

Setup

Dependencies

The file install_deps.sh provides the build process of our dependencies. Major pars of our framework can be used without all dependencies. In particular, we use

Docker

We provide a Docker container that contains all dependencies (but not Syntia itself). To build it, use the following commands:

# build docker container
docker build -t <name of container> <directory with docker file>

# run docker container interactively
docker run -it <container name> /bin/bash

The containers superuser password is root.

Contact

tim DOT blazytko AT rub DOT de

More Repositories

1

DroneSecurity

DroneSecurity (NDSS 2023)
Python
945
star
2

kAFL

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels
Python
550
star
3

redqueen

Python
339
star
4

OMEN

OMEN: Ordered Markov ENumerator - Password Guesser
C
314
star
5

Microcode

Microcode Updates for the USENIX 2017 paper: Reverse Engineering x86 Processor Microcode
Python
297
star
6

mobile_sentinel

Python
187
star
7

nyx-net

Python
176
star
8

Nyx

USENIX 2021 - Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types
C
169
star
9

ijon

C
164
star
10

GANDCTAnalysis

Code for the ICML 2020 paper: Leveraging Frequency Analysis for Deep Fake Image Recognition.
Python
161
star
11

nautilus

a grammar based feedback fuzzer
Rust
158
star
12

aurora

Usenix Security 2021 - AURORA: Statistical Crash Analysis for Automated Root Cause Explanation
Rust
146
star
13

grimoire

Python
125
star
14

loki

Hardening code obfuscation against automated attacks
Python
125
star
15

Password-Guessing-Framework

A Framework for Comparing Password Guessing Strategies
Python
121
star
16

Marx

Uncovering Class Hierarchies in C++ Programs
C++
114
star
17

antifuzz

AntiFuzz: Impeding Fuzzing Audits of Binary Executables
C
101
star
18

EthBMC

The code repository for the 2020 Usenix Security paper "EthBMC: A Bounded Model Checker for Smart Contracts"
Rust
91
star
19

WaveFake

Python
71
star
20

NEMO

Modeling Password Guessability Using Markov Models
Python
55
star
21

SiemensS7-Bootloader

Client utility for Siemens S7 bootloader special access feature
Python
55
star
22

gadget_synthesis

Esorics 2021 - Towards Automating Code-Reuse Attacks Using Synthesized Gadget Chains
Python
54
star
23

EvilCoder

Code for the paper EvilCoder: Automated Bug Insertion at ACSAC 2016
Java
42
star
24

JIT-Picker

Swift
34
star
25

cupid

Cupid: Automatic Fuzzer Selection for Collaborative Fuzzing
C
29
star
26

Probfuscator

An Obfuscation Approach using Probabilistic Control Flows
C#
28
star
27

Hypercube

NDSS 2020 - HYPER-CUBE: High-Dimensional Hypervisor Fuzzing
C
24
star
28

ijon-data

14
star
29

PrimGen

ACSAC 2018 paper: Towards Automated Generation of Exploitation Primitives for Web Browsers
HTML
13
star
30

adversarial-papers

TeX
12
star
31

DroneSecurity-Fuzzer

DroneSecurity Fuzzer (NDSS 2023)
11
star
32

dompteur

C++
10
star
33

we-value-your-privacy

Results and data from the paper "We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy"
9
star
34

VPS

VTable Pointer Separation
C++
7
star
35

WindowsVTV

MinGW for 32bit with Vtable pointer verification (VTV)
C++
6
star
36

APC

Android (Unlock) Pattern Classifier
Kotlin
6
star
37

nyx-net-profuzzbench

Shell
6
star
38

PriDi

Python
5
star
39

xTag-mtalloc

C
5
star
40

SUCI-artifacts

some PCAPs and logs
5
star
41

ASSS

Application-Specific Software Stacks
4
star
42

xTag

4
star
43

MiddleboxProtocolStudy

Auxiliary material for NDSS'20 paper: On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways
Python
4
star
44

Password-Strength-Meter-Accuracy

Measuring the Accuracy of Password Strength Meters
Python
3
star
45

uninformed-consent

Repo for material related to the CCS 2019 paper, "(Un)informed Consent: Studying GDPR Consent Notices in the Field"
3
star
46

be-the-phisher

Code related to the study presented in "Be the Phisher - Understanding Users’ Perception of Malicious Domains" @ AsiaCCS 2020
Jupyter Notebook
2
star
47

symtegrity

Code for the DIMVA 2018 paper "On the Weaknesses of Function Table Randomization"
2
star
48

MastersOfTime

2
star
49

libXSGS

Implementation of Delerablée and Pointcheval's eXtremely Short Group Signatures (XSGS)
Shell
2
star
50

xTag-llvm

C++
1
star
51

MachineCodeTimings

JavaScript
1
star
52

tropyhunter

TODO
Python
1
star
53

GDPR-fines

Supplemental Material for the PETS 2022 Paper "Investigating GDPR Fines in the Light of Data Flows"
Jupyter Notebook
1
star
54

GeneratedMediaSurvey

Jupyter Notebook
1
star