• Stars
    star
    284
  • Rank 141,104 (Top 3 %)
  • Language LLVM
  • License
    BSD 3-Clause "New...
  • Created almost 9 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool for analyzing x86-64 binaries.

reopt

Reopt is a general purpose decompilation and recompilation tool for repurposing application logic. It does this by analyzing machine code to recover a more flexible program representation -- specifically the LLVM assembly language. Once in this format, one can then apply optimization tools to optimize the LLVM, recompile the application into optimized or security hardened object code, and use Reopt to merge the recompiled code back into the original executable.

Reopt supports Linux x86_64 programs. We are working towards a full 1.0 release, but the current pre-release version supports the end-to-end recompilation toolchain.

Getting Reopt

Although Reopt can build on other POSIX systems such as OSX, we recommend building Reopt to run on Linux. Reopt currently only supports Elf binaries which are the default binary format for Linux. It does not support OSX Macho binaries, and so it is easier to find applications to try Reopt on when running Linux.

Gitpod

For most people, the easiest way to try out Reopt is to try it out on Gitpod. This requires an account on Gitpod, but gives you access to a VSCode IDE connected to a Linux container with Reopt pre-installed.

Github Releases

If you have Linux installed, you can download one of our recent releases from the Releases page. We build releases as static binaries on Centos 7, so they should work on a variety of distributions.

Docker

If you have Docker installed, you can install and run the Reopt pre-release Docker image by running:

docker pull galoisbinaryanalysis/reopt
docker run --rm -it galoisbinaryanalysis/reopt

Building from source

Building Reopt requires that one has installed the GHC Haskell compiler and supporting tooling. We currently build on GHC 8.10.4. An easy way to get GHC is to install ghcup, and run ghcup install ghc-8.10.4. We also maintain a Docker image that has GHC and other dependencies preinstalled for building Reopt.

Once GHC is installed, the following steps may be useful for building Reopt:

git clone https://github.com/GaloisInc/reopt.git

cd reopt
# Fix submodule URLs (can skip if you have a Github account)
sed -i 's/[email protected]:/https:\/\/github.com\//' .gitmodules
git submodule update --init
# Build Reopt
cabal install exe:reopt
# Build Reopt Explore
cabal install exe:reopt-explore

Reopt and Reopt Explore will be installed at $HOME/.cabal/bin/reopt $HOME/.cabal/bin/reopt-explore.

Reopt's verification condition generator (reopt-vcg) is included in the aforementioned Github release and Docker image, however the source is currently maintained in a separate repository with it's own build instructions and requirements.

Using Reopt

Once reopt is installed on a Linux system and included in your path, you can try running it on system utilities such as ls. To do an end-to-end recompilation, you can run reopt with the command.

$ reopt -o ls.exe $(which ls)

This execution will use the version of ls in your system path and produce an executable ls.exe in the current directory. When running reopt will print out messages as it discovers functions within the application and attempts to convert each discovered function into LLVM.

Inspecting intermediate state

During recompilation, Reopt has to do a complex series of analysis steps to lift the machine code into LLVM. Each of these analysis steps is incomplete and may fail either due to Reopt not recognizing features in the binary or an error in our prerelease version of Reopt. As such, do not be alarmed when Reopt fails to translate functions.

If you'd like to inspect Reopt's intermediate state, there are several command line flags to export intermediate results. We describe the main flags for exporting intermediate state below. Additional options can be viewed by running reopt --help.

  • Disassembly. reopt --disassemble <binary> provides a raw disassembler output view of the code in the binary. This is similiar to objdump's disassembly output.

  • Control flow graph construction. reopt --cfg <binary> displays the low level control flow graphs that Reopt has constructed for each discovered function within the binary. This is a low-level IR that maintains machine code's explicit stack and register references, but lifts the machine code instructions into a more architectural neutral register transfer language.

  • Function Recovery reopt --export-fns <path> <binary> writes the functions that Reopt has generated after performing stack and function argument analysis. This is a higher-level IR in which explicit references to the stack have been replaced with allocations, and functions take arguments.

  • LLVM Generation reopt --export-llvm <path> <binary> generates LLVM from the binary. This is essentially a version of function recovery rendered in LLVM's format. Providing the --annotations <ann_file> flag during LLVM generation will cause reopt to additionally emit JSON in <ann_file> describing verification conditions which (if valid) demonstrate functional equivalence between the generated LLVM and machine code. Running reopt-vcg <ann_file> will simulate the executation of the LLVM and machine code, block-by-block, leveraging an SMT solver (cvc4) to verify as many of the conditions as possible.

  • Object Files reopt --export-object <path> <binary> generates an object file from the LLVM generated in the previous state. This is essentially the same as generating the LLVM, and then running the LLVM compiler toolchain with the selected options.

Function arguments

One common reason Reopt fails is because it cannot figure out the arguments that a function can take. We have four mechanisms for obtaining function arguments: (1) User provided hints; (2) a small builtin database; (3) debug information; and (4) a demand analysis that looks at what registers are used to infer arguments. These mechanisms are listed in priority order, although we note that the builtin database is currently the only mechanism for supporting functions that take a variable number of arguments like printf.

If you'd like to provide hints to Reopt, the recommended way is write a C header file with the arguments, such as:

// decls.h


typedef long ssize_t;
typedef unsigned long size_t;

ssize_t read(int fd, void* buf, size_t count);
ssize_t write(int fd, const void* buf, size_t count);

You can then use this file to tell Reopt about the expected types for read and write via the --header flag, e.g.,

reopt -o ls.exe --header decls.h $(which ls)

Using OCCAM for additional optimizations

reopt can leverage the OCCAM whole-program partial evaluator for LLVM bitcode to further optimize binaries (assuming a user has already installed and made available both OCCAM and its accompanying interface slash).

This feature can be enabled by passing the --occam-config=FILE option to reopt, where FILE is the reopt/OCCAM manifest. The manifest should essentially a valid OCCAM manifest file (i.e., a file with JSON entries) with the following (optional) additional field:

  • slash_options: a list of command line option flags for OCCAM's slash tool,

and excluding the following fields (reopt will populate these appropriately):

  • binary
  • name

The main field should specify the desired name of the bitcode file that will be generated for OCCAM to process, and the OCCAM optimized result will share the name with an added .occam suffix.

N.B., when passing flags to customize OCCAM/slash behavior, be aware that reopt passes the -c and -emit-llvm flags via the ldflags manifest entry so OCCAM skips recompiling and acts only as an LLVM to LLVM translator.

Using Reopt Explore

With reopt-explore installed we can gather statistics regarding reopt's ability to recover functions in an individual or collection of binaries.

To examine a single binary, simply call reopt-explore with the a path to the binary:

$ reopt-explore llvm $(which ls)
...
/usr/bin/ls
  Initialization:
    Code segment: 112,004 bytes
    Initial entry points: 234
    Warnings: 0
  Discovery:
    Bytes discovered: 59,502 (53%)
    Succeeded: 216 (92%)
    Failed: 18 (8%)
      Unhandled instruction: 1 (0%)
      Unidentified control flow: 17 (7%)
  Argument Analysis:
    Succeeded: 123 (57%)
    Failed: 93 (43%)
    Header Warnings: 0
    DWARF Warnings: 0
    Code Warnings: 112
  Invariant Inference:
    Succeeded: 92 (75%)
    Failed: 31 (25%)
      Indirect call target: 1 (1%)
      Unresolved call target arguments: 30 (24%)
  Recovery:
    Succeeded: 81 (88%)
    Failed: 11 (12%)
      Unsupported function value: 8 (9%)
      Unimplemented LLVM backend feature: 3 (3%)
  LLVM generation status: Succeeded.

To recursively search a directory for binaries and examine each, call reopt-explore with the path to the directory to search:

$ reopt-explore llvm /usr/bin
...
reopt analyzed 394 binaries:
Generated LLVM bitcode for 394 out of 394 binaries.
Initialization:
  Code segment: 42,933,178 bytes
  Initial entry points: 79776
  Warnings: 0
Discovery:
  Bytes discovered: 23,025,164 (54%)
  Succeeded: 64,494 (81%)
  Failed: 15,500 (19%)
    Unhandled instruction: 425 (1%)
    Unidentified control flow: 15,075 (19%)
Argument Analysis:
  Succeeded: 40,429 (63%)
  Failed: 24,065 (37%)
  Header Warnings: 0
  DWARF Warnings: 0
  Code Warnings: 38,681
Invariant Inference:
  Succeeded: 30,221 (75%)
  Failed: 10,208 (25%)
    Symbolic call stack height: 1 (0%)
    Unresolved stack read: 13 (0%)
    Indirect call target: 526 (1%)
    Call target not function entry point: 41 (0%)
    Unresolved call target arguments: 9,614 (24%)
    Could not resolve varargs args: 13 (0%)
Recovery:
  Succeeded: 21,952 (73%)
  Failed: 8,269 (27%)
    Unsupported function value: 2,425 (8%)
    Unimplemented feature: 6 (0%)
    Unimplemented LLVM backend feature: 4,762 (16%)
    Stack offset escape: 83 (0%)
    Stack read overlapping offset: 1 (0%)
    Unresolved return value: 8 (0%)
    Missing variable value: 984 (3%)

Improving recovery with debug information

reopt and reopt-explore will try to determine if any debug information is available for dynamic dependencies by querying gdb (if it is installed).

Users can also manually specify dependency and debug directories to search in manually for both reopt and reopt-explore via the following flags:

--lib-dir=PATH              Additional location to search for dynamic
                            dependencies.
--debug-dir=PATH            Additional location to search for dynamic
                            dependencies' debug info.

Contributing

The project has been contributed to by many authors over many years without much coordination on code style and library usage. For our currently-maintained sub-projects, we favor the following, without enforcing them aggressively (i.e. you may still find instances where those are not used):

  • We use the fourmolu code formatter, informed by the fourmolu.yaml present in the root directory.

  • We are trying to move to optparse-applicative for CLI argument parsing (though there remain some instances of cmdargs for executables we are not actively maintaining).

  • We use prettyprinter for pretty-printing.

  • We tend to use labeled generic lenses for accessing complex data types in read/write. We still have a mixed use of named- and symbol-based lens operators.

  • We do not yet have a very nice logging story. Currently, the code outputs to stdout/stderr as it deems necessary, but we ought to use a more principled logging discipline.

More Repositories

1

cryptol

Cryptol: The Language of Cryptography
Haskell
1,105
star
2

HaLVM

The Haskell Lightweight Virtual Machine (HaLVM): GHC running on Xen
Haskell
1,049
star
3

crucible

Crucible is a library for symbolic simulation of imperative programs
Rust
594
star
4

saw-script

The SAW scripting language.
Haskell
433
star
5

ivory

The Ivory EDSL
Haskell
374
star
6

haskell-tor

A Haskell implementation of the Tor protocol.
Haskell
267
star
7

swanky

A suite of rust libraries for secure multi-party computation
Rust
236
star
8

macaw

Open source binary analysis tools.
Haskell
188
star
9

MATE

MATE is a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code using Code Property Graphs.
Python
170
star
10

what4

Symbolic formula representation and solver interaction library
Haskell
146
star
11

HaNS

The haskell network stack
Haskell
113
star
12

cclyzerpp

cclyzer++ is a precise and scalable pointer analysis for LLVM code.
C++
111
star
13

cereal

Haskell
77
star
14

ec2-unikernel

Tool for uploading unikernels into EC2
Haskell
77
star
15

smaccmpilot-build

An umbrella repository including all of the dependencies to build the smaccmpilot project
Makefile
72
star
16

dlkoopman

A general-purpose Python package for Koopman theory using deep learning.
Python
71
star
17

blt

Lattice-based integer linear programming solver
C++
60
star
18

parameterized-utils

A set of utilities for using indexed types including containers, equality, and comparison.
Haskell
57
star
19

grift

Galois RISC-V ISA Formal Tools
Haskell
54
star
20

daedalus

The Daedalus data description language
Haskell
53
star
21

saw-core

The SAW core language.
Haskell
53
star
22

ddosflowgen

Simulate DDoS attacks and generate traffic datasets
Python
51
star
23

yapall

A precise and scalable pointer analysis for LLVM, written in Ascent
C
50
star
24

minlibc

C
49
star
25

llvm-pretty-bc-parser

Parser for the llvm bitcode format
LLVM
48
star
26

renovate

A library for binary analysis and rewriting
Haskell
46
star
27

FreeRTOS-Xen

FreeRTOS 7.6.0 ported to run as a Xen guest on ARM systems.
C
45
star
28

hacrypto

Experiments in high-assurance crypto.
C
43
star
29

tower

A concurrency framework for the Ivory language
Haskell
41
star
30

lean4-balance-car

Lean4 port of Arduino balance car controller
C++
40
star
31

lean-protocol-support

This project contains various supporting libraries for lean to reason about protocols.
Lean
38
star
32

jvm-parser

A Haskell parser for JVM bytecode files
Haskell
37
star
33

flexdis86

A library for disassembling x86-64 binaries.
Haskell
36
star
34

halfs

The Haskell File System: A file system implementation in Haskell
Haskell
36
star
35

elf-edit

The elf-edit library provides a datatype suitable for reading and writing Elf files.
Haskell
35
star
36

mir-verifier

SAW front end for the MIR language from rustc
HTML
34
star
37

semmc

Stratified synthesis for learning machine code instruction semantics
Haskell
34
star
38

fancy-garbling

Rust implementation of the BMR16 arithmetic garbling scheme.
Rust
33
star
39

smaccmpilot-stm32f4

SMACCMPilot flight controller
JavaScript
32
star
40

pure-zlib

A Haskell-only implementation of zlib / DEFLATE.
Haskell
31
star
41

helib-demos

Experiments in homomorphic encryption
C++
30
star
42

FiveUI

Extensible UI Analysis in your browser
JavaScript
29
star
43

lean-llvm

LLVM support for the lean theorem prover
Lean
28
star
44

galua

Lua debugger and interpreter
JavaScript
27
star
45

BESSPIN

Top-level repository including all relevant BESSPIN repository
26
star
46

curl

A Haskell binding to the curl library
Haskell
26
star
47

json

Haskell JSON library
Haskell
25
star
48

xml

Haskell XML library
Haskell
25
star
49

estimator

State-space estimation algorithms and models
Haskell
25
star
50

dismantle

A library of assemblers and disassemblers derived from LLVM TableGen data
HTML
24
star
51

http-server

A Haskell HTTP server
Haskell
23
star
52

llvm-verifier

The LLVM Symbolic Simulator, part of SAW.
Haskell
21
star
53

RSA

Haskell RSA Library
Haskell
20
star
54

msf-haskell

Haskell implementation of Metasploit remote API
Haskell
20
star
55

cryfsm

convert simple cryptol expressions into finite-state machines
Haskell
20
star
56

lean-haskell-bindings

Haskell Bindings to the Lean Theorem Prover http://leanprover.github.io/
Haskell
19
star
57

FAW

Galois Format Analysis Workbench
HTML
19
star
58

sqlite

A simple sqlite3 library for Haskell
C
19
star
59

BLST-Verification

BLST-Verification
Python
19
star
60

cryptol-specs

A central repository for specifications of cryptographic algorithms in Cryptol
TeX
18
star
61

surveyor

A symbolic debugger for C/C++ (via LLVM), machine code, and JVM programs
Haskell
18
star
62

mistral

An interpreter for the Mistral language.
Haskell
18
star
63

LinearArbitrary-SeaHorn

LinearArbitrary-SeaHorn is a CHC solver for LLVM-based languages.
C
17
star
64

cryptol-semantics

Semantics for Cryptol
Coq
16
star
65

e2eviv

Artifacts associated with the U.S. Vote Foundation's E2E-VIV Project (end-to-end verifiable internet voting).
TeX
16
star
66

scuttlebutt

Multi-party computation utilities toolkit for rust
Rust
15
star
67

halvm-web

Haskell
15
star
68

HARDENS

Repository for the HARDENS project
Jupyter Notebook
15
star
69

pads-haskell

A domain specific language for processing ad-hoc data.
Haskell
14
star
70

ardupilot-mega

Fork: DO NOT SUBMIT PULL REQS/BUG REPORTS HERE
C
14
star
71

pate

Patches Assured up to Trace Equivalence
Haskell
14
star
72

sat2015-crypto

Slides and examples to accompany the September 25th invited talk at SAT 2015
TeX
14
star
73

ICryptol

IPython-style interaction for Cryptol
Haskell
14
star
74

reopt-vcg

A translation verifier for Reopt (https://github.com/GaloisInc/reopt)
SMT
13
star
75

ivory-tower-stm32

Tower backend and Ivory board support package for the STM32 line of microcontrollers
C
12
star
76

AMIDOL

Scientific model creation toolset.
HTML
12
star
77

hpb

Haskell Protocol Buffers
Haskell
12
star
78

golang

Parser and type analysis for the Go programming language
Haskell
12
star
79

mime

A Haskell MIME library
Haskell
12
star
80

LIMA

LIMA: Language for Integrated Modeling and Analysis
Haskell
12
star
81

gec

Embedded-friendly crypto a la SMACCM
Haskell
12
star
82

SHA

Haskell implementation of SHA / SHA2 hash functions
Haskell
12
star
83

alex-tools

A Haskell library making it easier to write Alex lexers.
Haskell
12
star
84

nasa-affirm

Architectural Framework For Integrated Refinement Modeling
HTML
11
star
85

regex-fsm

Convert regular expressions into efficient matrix branching programs
Haskell
11
star
86

salty

A DSL for generating GR(1) problems
JavaScript
11
star
87

simple-tar

A very simple tar archive processing library
Haskell
11
star
88

ocelot

Oblivious transfer library for rust
Rust
11
star
89

hexdump

A human readable style for binary data.
Haskell
11
star
90

gidl

Gidl: an Interface Description Language
Haskell
11
star
91

saw-core-coq

A translator from SAWCore to Coq
Coq
11
star
92

BESSPIN-Voting-System-Demonstrator-2019

The BESSPIN Voting System. This system is used to demonstrate and red team SSITH secure CPUs.
Coq
11
star
93

argo

A Haskell library for building JSON-RPC servers (work in progress), with servers for Cryptol and SAW
Haskell
11
star
94

lustre

A parser and AST for Lustre
Haskell
11
star
95

rustwall

Rust firewall for seL4
Rust
10
star
96

csaf

Control Systems Analysis Framework - a framework to minimize the effort required to evaluate, implement, and verify controller design (classical and learning enabled) with respect to the system dynamics.
Python
10
star
97

ivory-rtverification

Runtime verification for C code via a GCC plugin architecture.
Haskell
10
star
98

BESSPIN-GFE-2019

The DARPA SSITH-funded Government Furnished Equipment on which all secure CPUs are based.
Verilog
9
star
99

cryptol-verifier

The Cryptol Symbolic Simulator, part of SAW.
Haskell
9
star
100

network-hans

HaNS <-> Network Shims for easier porting to HaNS and the HaLVM
Haskell
9
star