• Stars
    star
    653
  • Rank 68,968 (Top 2 %)
  • Language
    C++
  • License
    GNU Affero Genera...
  • Created about 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast and accurate disassembler

Datalog Disassembly

DDisasm is a fast disassembler which is accurate enough for the resulting assembly code to be reassembled. DDisasm is implemented using the datalog (souffle) declarative logic programming language to compile disassembly rules and heuristics. The disassembler first parses ELF file information and decodes a superset of possible instructions to create an initial set of datalog facts. These facts are analyzed to identify code location, symbolization, and function boundaries. The results of this analysis, a refined set of datalog facts, are then translated to the GTIRB intermediate representation for binary analysis and reverse engineering. The GTIRB pretty printer may then be used to pretty print the GTIRB to reassemblable assembly code.

ddisasm supports disassembling ELF and PE binary formats on x86_32, x86_64, ARM32, ARM64, and MIPS32 architectures.

Usage

ddisasm can be used to disassemble an ELF binary:

ddisasm examples/ex1/ex --asm ex.s

The generated assembly can then be rebuilt with gcc:

gcc -nostartfiles ex.s -o ex_rewritten

Installing

There are a number of options to install a pre-built copy of ddisasm:

  • Docker image published to Docker Hub
  • Ubuntu apt packages published to the GTIRB apt repository
  • .zip archives of the Windows build published to the GrammaTech fileserver

These options offer stable and unstable variants. It is critical to install a consistent set of tools, using tools that are all stable or all unstable; a mix of stable and unstable tools will likely not work. The stable versions are recommended for most users. The unstable versions reflect the latest state of the development branch, and may include bugs and unannounced breaking changes.

Note that installing the gtirb Python package from pip yields a stable package, which will only work with corresponding stable versions of ddisasm; see the GTIRB README for more details.

Docker

The Docker image is the easiest way to download and try ddisasm quickly.

  • grammatech/ddisasm:latest - the latest stable version
  • grammatech/ddisasm:unstable - the latest unstable version
  • grammatech/ddisasm:1.5.7 - a specific release of ddisasm

Explore the available tags at https://hub.docker.com/r/grammatech/ddisasm

Ubuntu

Packages for Ubuntu 20 are available in the GTIRB apt repository and may be installed per the following instructions.

First, add GrammaTech's APT key.

wget -O - https://download.grammatech.com/gtirb/files/apt-repo/conf/apt.gpg.key | apt-key add -

Next update your sources.list file.

echo "deb https://download.grammatech.com/gtirb/files/apt-repo [distribution] [component]"| sudo tee -a /etc/apt/sources.list

Where:

  • [distribution] is focal (currently, only Ubuntu 20 packages are available)
  • [component] is either stable, which holds the last versioned release, or unstable, which holds the HEAD of the repository.

Finally update your package database and install the core GTIRB tools:

sudo apt-get update
sudo apt-get install gtirb-pprinter ddisasm

Warning: There is a problem with the packages in the stable repository that will cause conflicts if you try apt-get upgrade. In this case, uninstall and reinstall the packages you got from the GTIRB repository. You may need to use dpkg --remove to remove the metapackages (e.g. ddisasm) before removing the concrete versioned packages (e.g. ddisasm-1.5.1).

Windows

Windows releases are packaged as .zip files and are available at https://download.grammatech.com/gtirb/files/windows-release/.

Dependencies

ddisasm uses C++17, and requires a compiler which supports that standard such as gcc 9, clang 6, or MSVC 2017.

To build ddisasm from source, the following requirements should be installed:

Note that these versions are newer than what your package manager may provide by default: This is true on Ubuntu 18, Debian 10, and others. Prefer building these dependencies from sources to avoid versioning problems. Alternatively, you can use the GrammaTech PPA to get the correct versions of the dependencies. See the GTIRB readme for instructions on using the GrammaTech PPA.

Building ddisasm

Use the following options to configure cmake:

  • You can tell CMake which compiler to use with -DCMAKE_CXX_COMPILER=<compiler>.

  • You can tell CMake about the paths to its dependencies as follows:

Option Description
gtirb_DIR Path to the GTIRB build directory.
gtirb_pprinter_DIR Path to the gtirb-pprinter build directory.
LIEF_DIR Path to the LIEF build directory.
  • ddisasm can make use of GTIRB in static library form (instead of shared library form, the default) if you use the flag -DDDISASM_BUILD_SHARED_LIBS=OFF.

  • You can tell CMake to use ccache with the flag -DCMAKE_CXX_COMPILER_LAUNCHER=ccache. This is especially useful when Souffle is configured to generate multiple files.

  • For development, you can ask Souffle to generate multiple files per target with -DDDISASM_GENERATE_MANY=ON. This results in a slower initial build time, but recompilation will be faster.

Once the dependencies are installed, you can configure and build as follows:

$ cmake ./ -Bbuild
$ cd build
$ make

When using -DDDISASM_GENERATE_MANY=ON, it is safe to aggressively parallelize the build (e.g. -j$(nproc)). This is not recommended otherwise, as memory usage by the compiler is high.

Debug build options

One can selectively turn off ddisasm's various architecture support modules to speed up compilation time during development. For example:

$ cmake ./ -Bbuild -DDDISASM_ARM_64=OFF -DDDISASM_X86_32=OFF

will deactivate ARM_64 and X86_32 support.

Running the analysis

Once ddisasm is built, we can run complete analysis on a file by calling build/bin/ddisasm'. For example, we can run the analysis on one of the examples as follows:

cd build/bin && ./ddisasm ../../examples/ex1/ex --asm ex.s

Ddisasm accepts the following parameters:

--help : produce help message

--ir arg : GTIRB output file

--json arg : GTIRB json output file

--asm arg : ASM output file

--debug : if the assembly code is printed, it is printed with debugging information

--debug-dir arg : location to write CSV files for debugging

--hints arg : location of user-provided hints file

-K [ --keep-functions ] arg : Print the given functions even if they are skipped by default (e.g. _start)

--self-diagnose : This option is useful for debugging. Use relocation information to emit a self diagnosis of the symbolization process. This option only works if the target binary contains complete relocation information. You can enable that in ld using the option --emit-relocs.

-F [ --skip-function-analysis ] : Skip additional analyses to compute more precise function boundaries.

-j [ --threads ] : Number of cores to use.

-I [ --interpreter ] arg : Execute the Souffle interpreter with the specified source directory.

-L [ --library-dir ] arg : Specify the search directory for the Souffle interpreter to locate functor libraries.

--profile arg : Generate Souffle profiling information in the specified directory.

Testing

To run the test suite, run:

cd build && PATH=$(pwd)/bin:$PATH ctest

Providing User Hints

A user can provide a file with user hints to guide and overcome limitations in the current ddisasm implementation. User hints are simply datalog facts that are added to the database before running the Datalog program. Datalog hints are provided in tab-separated .csv format where the first field is the predicate name namespaced with the pass name and subsequent fields are the fact field values to be added.

For example

disassembly.invalid 0x100 definitely_not_code

will add a fact invalid(0x100,"definitely_not_code") to the Datalog database of the disassembly pass. The fields need to be separated by tabs '\t'.

Contributing

See CONTRIBUTING.md

External Contributors

  • Programming Language Group, The University of Sydney: Initial support for ARM64.

AuxData

See doc/AuxData.md

Some References

  1. Datalog Disassembly

  2. Souffle

  3. Capstone disassembler

  4. Control Flow Integrity for COTS Binaries

  5. Alias analysis for Assembly

  6. Reassembleable Disassembling

  7. Ramblr: Making reassembly great again

  8. An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries

  9. Binary Code is Not Easy

More Repositories

1

gtirb

Intermediate Representation for Binary analysis and transformation
C++
306
star
2

sel

Programmatic modification and evaluation of software
Common Lisp
166
star
3

retypd

Python
69
star
4

clang-mutate

Manipulate C-family ASTs with Clang
C++
64
star
5

gtirb-pprinter

Pretty printer from GTIRB to assembly code
C++
48
star
6

resolve

Resolve software differencing and merging
C++
37
star
7

gtirb-stack-stamp

Apply ROP protection to a binary using binary rewriting with GTIRB
C++
32
star
8

retypd-ghidra-plugin

Retypd plugin for Ghidra reverse engineering framework from NSA
Java
22
star
9

swap-detector

A library for detecting swapped arguments in function calls, and a Clang Static Analyzer plugin used to demonstrate the library.
C
21
star
10

cl-utils

GrammaTech Common Lisp Utilities
Common Lisp
18
star
11

cl-smt-lib

Common Lisp package providing an SMT object supporting SMT-LIB communication over input and output streams
Common Lisp
18
star
12

gtirb-rewriting

Python API for rewriting GTIRB files
Python
16
star
13

cgc-cbs

Challenge binaries (CBs) and tools from DARPA's Cyber Grand Challenge (CGC)
C
16
star
14

functional-trees

Tree data structure supporting functional manipulation. Works closely with FSet.
Common Lisp
14
star
15

mc-asm

Assemble code to bytes using LLVM's MC layer
C++
14
star
16

gtirb-ddisasm-retypd

Python
13
star
17

gtirb-ghidra-plugin

Ghidra plugin to handle GTIRB files
Java
12
star
18

trace-db

Writing, reading, storing, and searching of program traces (source and binary)
C
11
star
19

gtirb-vscode

Python
7
star
20

gtirb-capstone

Python
7
star
21

elf

A Common Lisp library for manipulating ELF files
Common Lisp
7
star
22

pylint-sarif

Python
6
star
23

gtirb-search-reduce

Reduce a binary to only retain that which is required to continue to pass a provided test suite.
Python
6
star
24

cl-capstone

Common Lisp bindings for the Capstone disassembler
Common Lisp
5
star
25

stefil-

Common Lisp
5
star
26

github-sarif-integration

Python
5
star
27

gtirb-functions

C++
4
star
28

cl-make

Makefile
3
star
29

grammatech.github.io

GrammaTech Research on GitHub
TeX
2
star
30

vscode-codesonar

CodeSonar extension for Visual Studio Code
TypeScript
2
star
31

postfix-docker

Shell
2
star
32

j8-tests

Java
1
star
33

rebloat

1
star
34

gtirb-types

Python
1
star