• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Quarkslab Bindiffer but not only !

QBinDiff

QBinDiff is an experimental binary diffing tool addressing the diffing as a Network Alignement Quadratic Problem.

But why developing yet another differ when Bindiff works well?

Bindiff is great, no doubt about it, but we have no control on the diffing process. Also, it works great on standard binaries but it lacks flexibility on some corner-cases (embedded firmwares, diffing two portions of the same binary etc...).

A key idea of QBinDiff is enabling tuning the diffing programmatically by:

  • writing its own feature
  • being able to enforce some matches
  • emphasizing either on the content of functions (similarity) or the links between them (callgraph)

In essence, the idea is to be able to diff by defining its own criteria which sometimes, is not the control-flow and instructions but could for instance, be data-oriented.

Last, QBinDiff as primarily been designed with the binary-diffing use-case in mind, but it can be applied to various other use-cases like social-networks. Indeed, diffing two programs boils down to determining the best alignment of the call graph following some similarity criterion.

Indeed, solving this problem is APX-hard, that why QBinDiff uses a machine learning approach (more precisely optimization) to approximate the best match.

Like Bindiff, QBinDiff also works using an exported disassembly of program obtained from IDA. Originally using BinExport, it now also support Quokka as backend, which extracted files, are more exhaustive and also more compact on disk (good for large binary dataset).

Note

QBinDiff is an experimental tool for power-user where many parameters, features, thresholds or weights can be adjusted. Obtaining good results usually requires tuning these parameters.

(Please note that QBinDiff does not intend to be faster than other differs, but rather being more flexible.)

Warning

QBinDiff graph alignment is very memory intensive (compute large matrices), it can fill RAM if not cautious. Try not diffing binaries larger than +10k functions. For large program use very high sparsity ratio (0.99).

Documentation

The documentation can be found on the diffing portal or can be manually built with

pip install .[doc]
cd doc
make html

Below you will find some sections extracted from the documentation. Please refer to the full documentation in case of issues.

Installation

QBinDiff can be installed through pip with:

pip install qbindiff

As some part of the algorithm are very CPU intensive the installation will compile some components written in native C/C++.

As depicted above, QBinDiff relies on some projects (also developed at Quarkslab):

  • python-binexport, wrapper on the BinExport protobuf format.
  • python-bindiff, wrapper around bindiff (used to write results as Bindiff databases)
  • Quokka, another binary exported based on IDA. Faster than binexport and more exhaustive (thus diffing more relevant)

Usage (command line)

After installation, the binary qbindiff is available in the path. It takes in input two exported files and start the diffing analysis. The result can then be exported in a BinDiff file format. The default format for input files is BinExport, for a complete list of backend loader look at the -l1, --loader1 option in the help. The complete command line options are:

Usage: qbindiff [OPTIONS] <primary file> <secondary file>

  QBinDiff is an experimental binary diffing tool based on machine learning technics, namely Belief propagation.

Options:
  -l1, --loader1 <loader>       Loader type to be used for the primary. Must be one of these ['binexport', 'quokka',
                                'ida']  [default: binexport]
  -l2, --loader2 <loader>       Loader type to be used for the secondary. Must be one of these ['binexport', 'quokka',
                                'ida']  [default: binexport]
  -f, --feature <feature>       Features to use for the binary analysis, it can be specified multiple times.
                                Features may be weighted by a positive value (default 1.0) and/or compared with a
                                specific distance (by default the option -d is used) like this <feature>:<weight>:<distance>.
                                For a list of all the features available see --list-features.
  -n, --normalize               Normalize the Call Graph (can potentially lead to a partial matching). [default
                                disabled]
  -d, --distance <function>     The following distances are available ['canberra', 'euclidean', 'cosine',
                                'jaccard_strong']  [default: canberra]
  -s, --sparsity-ratio FLOAT    Ratio of least probable matches to ignore. Between 0.0 (nothing is ignored) to 1.0
                                (only perfect matches are considered)  [default: 0.75]
  -sr, --sparse-row             Whether to build the sparse similarity matrix considering its entirety or processing
                                it row per row
  -t, --tradeoff FLOAT          Tradeoff between function content (near 1.0) and call-graph information (near 0.0)
                                [default: 0.75]
  -e, --epsilon FLOAT           Relaxation parameter to enforce convergence  [default: 0.5]
  -i, --maxiter INTEGER         Maximum number of iteration for belief propagation  [default: 1000]
  -e1, --executable1 PATH       Path to the primary raw executable. Must be provided if using quokka loader
  -e2, --executable2 PATH       Path to the secondary raw executable. Must be provided if using quokka loader
  -o, --output PATH             Write output to PATH
  -ff, --file-format [bindiff]  The file format of the output file. Supported formats are [bindiff]  [default:
                                bindiff]
  -v, --verbose                 Activate debugging messages. Can be supplied multiple times to increase verbosity
  --version                     Show the version and exit.
  --arch-primary TEXT           Force the architecture when disassembling for the primary. Format is
                                'CS_ARCH_X:CS_MODE_Ya,CS_MODE_Yb,...'
  --arch-secondary TEXT         Force the architecture when disassembling for the secondary. Format is
                                'CS_ARCH_X:CS_MODE_Ya,CS_MODE_Yb,...'
  --list-features               List all the available features
  -h, --help                    Show this message and exit.

Library usage

The strength of qBinDiff is to be usable as a python library. The following snippet shows an example of loading to binexport files and to compare them using the mnemonic feature.

from qbindiff import QBinDiff, Program
from qbindiff.features import WeisfeilerLehman
from pathlib import Path

p1 = Program(Path("primary.BinExport"))
p2 = Program(Path("secondary.BinExport"))

differ = QBinDiff(p1, p2)
differ.register_feature_extractor(WeisfeilerLehman, 1.0, distance='cosine')

differ.process()

mapping = differ.compute_matching()
output = {(match.primary.addr, match.secondary.addr) for match in mapping}

Contributing & Contributors

Any help, or feedback is greatly appreciated via Github issues, pull requests.

Current:

  • Robin David
  • Riccardo Mori
  • Roxane Cohen

Past:

  • Alexis Challande
  • Elie Mengin

All contributions

More Repositories

1

binbloom

Raw binary firmware analysis software
C
493
star
2

kdigger

Kubernetes focused container assessment and context discovery tool for penetration testing
Go
424
star
3

quarkspwdump

Dump various types of Windows credentials without injecting in any process.
C
418
star
4

rewind

Snapshot-based coverage-guided windows kernel fuzzer
Rust
307
star
5

arybo

Manipulation, canonicalization and identification of mixed boolean-arithmetic symbolic expressions
C++
293
star
6

irma

IRMA is an asynchronous & customizable analysis system for suspicious files.
JavaScript
268
star
7

conf-presentations

Quarkslab conference talks
263
star
8

dreamboot

UEFI bootkit
C
230
star
9

binmap

system scanner
C++
216
star
10

legu_unpacker_2019

Scripts to unpack APK protected by Legu
Python
211
star
11

AERoot

AERoot is a command line tool that allows you to give root privileges on-the-fly to any process running on the Android emulator with Google Play flavors AVDs.
Python
195
star
12

android-restriction-bypass

PoC to bypass Android restrictions
C++
194
star
13

peetch

An eBPF playground
Python
184
star
14

titanm

This repository contains the tools we used in our research on the Google Titan M chip
C
181
star
15

quokka

Quokka: A Fast and Accurate Binary Exporter
C++
165
star
16

NFLlib

NTT-based Fast Lattice library
C++
165
star
17

pastis

PASTIS: Collaborative Fuzzing Framework
Python
154
star
18

samsung-trustzone-research

Reverse-engineering tools and exploits for Samsung's implementation of TrustZone
Python
143
star
19

qsynthesis

Greybox Synthesizer geared for deobfuscation of assembly instructions.
Python
136
star
20

pyrrha

A tool for firmware cartography
Python
135
star
21

llvm-passes

Collection of various llvm passes
C++
115
star
22

qb-sync

qb-sync is an open source tool to add some helpful glue between IDA Pro and Windbg. Its core feature is to dynamically synchronize IDA's graph windows with Windbg's position.
C++
115
star
23

starlink-tools

A collection of tools for security research on Starlink's User Terminal
Python
112
star
24

LLDBagility

A tool for debugging macOS virtual machines
C
107
star
25

tritondse

Triton-based DSE library with loading and exploration capabilities (and more!)
Python
102
star
26

sspam

Symbolic Simplification with PAttern Matching
Python
100
star
27

android-fuzzing

C
100
star
28

CVE-2020-0069_poc

C
97
star
29

minik8s-ctf

A beginner-friendly CTF about Kubernetes security.
Shell
74
star
30

QBDL

QuarkslaB Dynamic Linker library
C++
71
star
31

iMITMProtect

Prevent Apple to mess with keys
C
70
star
32

whvp

PoC for a snapshot-based coverage-guided fuzzer targeting Windows kernel components
Rust
67
star
33

mattermost-plugin-e2ee

End-to-end encryption plugin for Mattermost
TypeScript
66
star
34

aosp_dataset

Large Commit Precise Vulnerability Dataset based on AOSP CVE
Python
57
star
35

llvm-dev-meeting-tutorial-2015

Material for an LLVM Tutorial presented at LLVM Dev Meeting 2015
TeX
47
star
36

dxfx

DxFx is a proof-of-concept DJI Pilot unpacker
Python
31
star
37

irma-probe

IRMA probe
25
star
38

irma-frontend

IRMA frontend
25
star
39

irma-ansible-old

IRMA ansible
24
star
40

libleeloo

Library to manage big sets of integers (and IPv4 ranges)
C++
23
star
41

sboot-binwalk

Python
21
star
42

irma-brain

IRMA brain
21
star
43

nodescan

Asynchronous scanning library
C++
19
star
44

pixiefail

PoC for PixieFail vulnerabilities
Python
18
star
45

python-binexport

Python interface for Binexport, the Bindiff export format
Python
14
star
46

numbat

Library to manipulate and create Sourcetrail databases
Python
14
star
47

bgraph

BGraph is a tool designed to generate dependencies graphs from Android.bp soong files.
Python
14
star
48

training_ecu

Hardware and software for the ECU we use during trainings
C++
14
star
49

dataset-call-graph-blogpost-material

12
star
50

idascript

Utilities scripts and Python module to facilitate executing idapython scripts in IDA.
Python
10
star
51

python-bindiff

Python module wrapping Bindiff usage into a Python API.
Python
10
star
52

BVWhiteBox

This PoC illustrates our work on asymmetric white-box cryptography, it can be used to generate a set of lookup tables used for lattice-based white-box scheme
Python
10
star
53

tpmee

Python
9
star
54

nvidia-ngx-wrapper

C
9
star
55

sstic-tame-the-qemu

C
9
star
56

ip_conv_sse

C++
9
star
57

crypto-condor

crypto-condor is a Python library for compliance testing of implementations of cryptographic primitives
C
8
star
58

qsig

QSig: Patch signature generation - detection tool
Python
8
star
59

linksys-wag200G

Some binaries and tools for the Linksys WAG200N router
C
7
star
60

windbg-vtl

JavaScript debugger extension for WinDbg that allows to dump the partitions running on Hyper-V
JavaScript
7
star
61

keyringer

Fork of keyringer from https://keyringer.pw (added some features like tree view, additional checks, ...)
Shell
7
star
62

irma-common

IRMA common
7
star
63

ansible-selenium-server

a Vagrant VM using Ansible to provide a Selenium Server
Shell
7
star
64

irmacl

irma api command line client
Python
6
star
65

land_of_cxx

C++
6
star
66

hooking-golang-playground

Various experiments with golang internals
C
4
star
67

erlang-prism

PRISM is a disassembler for Erlang BEAM virtual machine bytecode
Python
4
star
68

qb.backup

The server-side script of the qb.backup orchestration solution.
Python
4
star
69

wirego

C
4
star
70

wdnis_tool

CMake
3
star
71

diffing-portal

Static site for diffing portal
Jupyter Notebook
3
star
72

ziphyr

On-the-fly zip of streamed file with optional zipcrypto.
Python
2
star
73

python-zipstream

forked from allanlei/python-zipstream
Python
2
star
74

ansible-playbook-qb.backup

An example Ansible playbook deploying the roles qb.backup and qb.backup_server.
1
star
75

irma-web-ui

IRMA Web User Interface
JavaScript
1
star
76

irma-probe-tutorial

1
star
77

irmacl-async

Asynchronous client library for IRMA API
Python
1
star
78

can-workshop

Files for the Grehack 2021 workshop: Revers3 me if you CAN
Python
1
star