• This repository has been archived on 28/Mar/2023
  • Stars
    star
    177
  • Rank 215,985 (Top 5 %)
  • Language
    C++
  • Created over 5 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An LLVM sanitizer tutorial

llvm-sanitizer-tutorial and documentation

This is a tutorial on how to build an LLVM sanitizer.

Background

An LLVM sanitizer is a powerful tool used to instrument and analyze programs. This github repo holds an example sanitizer and step by step documentation to integrate a sanitizer into the toolchain. This sanitizer can serve as a template towards building more complex tools. For more information on what sanitizers are, see the related blogpost: https://blog.trailofbits.com/2019/06/25/creating-an-llvm-sanitizer-from-hopes-and-dreams/

Quickstart: Building the toolchain & running a sanitizer

There are three patch files in this repo, one for LLVM, clang, and compiler-rt. The install script will download version 8, apply the patches, and put the new files in their appropriate locations.

This will be stored as a .patch shortly

#Clone the repo
git clone https://github.com/trailofbits/llvm-sanitizer-tutorial.git && cd llvm-sanitizer-tutorial/llvm 
#Make the build dir 
mkdir build && cd build 
#configure and build, there are a lot of configuration options for LLVM
cmake -DLLVM_TARGETS_TO_BUILD="X86" .. && make
cd bin && ./clang -fsanitize=testsan -g -o malloc_target ../../../target_programs/malloc_target.c
./malloc_target

You should see output from the LLVM pass and additional output from the runtime component when the program is executed. Most of this readme will be from the blogpost above, but in this repo, I'm going to list all the tedious technical details that didn't make it past editing. Note that this post only covers on how to build a sanitizer for Linux.

Building an out of source pass

Why build out of source first? Building your instrumentation pass out of source is a good first step when building your sanitizer. This allows you to debug your pass and determine if it's functioning correctly. When building the LLVM tool chain, you can use the opt tool to run your pass on bitcode and use the llvm-dis tool to view the actual IR.

./clang -c -emit-llvm ../../../target_programs/malloc_target.c -o malloc_target.bc
./opt -load ../lib/LLVMTestPass.so -testfunc < malloc_target.bc > malloc_instrumented.bc 
./llvm-dis < malloc_instrumented.bc | less

The first thing is to create your pass, check out llvm/lib/Transform/TestPass/TestPass.cpp for the code I'm going to be referencing. The LLVM module is the largest unit of compilation, it essentially represents the file. The function and basic block passes operate at those respective levels. The module pass just prints out the function names, the function pass instruments function entries, and the basic block pass inserts function calls after malloc. These function symbols will be defined inside of our runtime component. At the bottom of the file there is a few lines of code to register the pass with opt. Later on these will be removed and replaced with functions that create the pass object. These functions will be called by the LLVM pass manager when your specify your sanitizer to clang. To build this module create a new directory in llvm/lib/Transforms/ and use the add_llvm_library macro. You can copy the TestPass or the Hello cmake files for reference.

Building a runtime component

Sanitizer runtimes are located in llvm/projects/compiler-rt/lib/. The sanitizer runtime component supplies runtime functions that the transformation pass will call into. In the testsan directory, there is an example runtime that defines some functions and shows how to use the interceptor interface. The actual mechanics of the INTERCEPTOR macro differs based on the OS, on Linux it replaces the symbol address and uses dlsym to resolve the real function address. There are a two other things to take note of in this example.

  • The macro SANITIZER_INTERFACE tells compiler-rt that it needs to export that function symbol because it might be called by the instrumented program.
  • The init function contains macro magic, it's designed to run immediately upon being loaded. This is either done by placing the function in the .pre_init array or with the constructor attribute.

There are a few steps required to build the runtime component. Look at the testsan cmake file for an example reference on how to use these cmake macros. If you are building on linux you can probably just copy it and replace testsan with the name of your sanitizer. If there is confusion the macros are defined in compiler-rt/cmake.

  • Create a directory for your source in llvm/projects/compiler-rt/lib/
  • In the cmake file you need to
    • Add the component to compiler-rt
    • Use the add_compiler_rt_runtime macro to add your runtime
      • Make sure to include the RTCommon libs and interceptor lib if you use them.
    • Use add_sanitizer_rt_symbols to generate the interface symbols

The next step is modifying the compiler-rt/cmake/config-ix.make. This is apart of the compiler-rt build system and sets variables for your sanitizer to decide if your component could be built by checking to see what operating system and architectures you set. The file is actually rather large, feel free to search for TESTSAN and testsan to find the right places.

  • add your sanitizer name to the list of all sanitizers. The cmake file in the lib directory iterates over the sanitizers in this list to decide which ones to try and build.
  • define your sanitizers supported architectures (X86, X86_64)
  • check if the operating system is supported for your architecture and set build flag to true

At this point you should be able to build your runtime pass by just attempting to build the toolchain.

Defining the sanitizer/Modifying the driver

These steps are what you need to do to define the sanitizer and set up the compiler driver to be ready for integration.

  • In llvm/tools/clang/include/Basic/Sanitizers.def add your sanitizers using the macro like all the others.
  • In llvm/tools/clang/lib/Driver/SanitizersArgs.h add a quick helper function to check if the runtime is required. For an example check the needsDfSanRT() function. This step is not actually needed because you can inline it anywhere since it's simple but for more complex sanitizers you can create complicated logic in SanitizersArgs.cpp
  • In clang/lib/CodeGen/BackendUtil.cpp check if your sanitizer is being run, and if it is set the pass to run last. You can look at any of the other sanitizers for reference, it's just boilerplate.

Integrating a pass

This is just a few steps, the work is mostly done since the pass is already written. The only thing now is to add it to the internal build system and help the driver find it.

  • Copy your out of source pass code into llvm/lib/Transform/Instrumentation
  • Remove the three lines that register with opt and replace them with functions that create your passes. Check the TestPass.cpp file for a reference
  • Edit the CMake file to include your pass Now that you have an internal instrumentation pass, time to add it to the manager
  • Define the prototype of the function you just made in llvm/lib/Transform/Instrumentation in llvm/include/llvm/Transforms/Instrumentation.h. This way the driver can see it.
  • Create a new function in clang/lib/CodeGen/BackendUtil.cpp that adds your pass to the manager. You can look for the addTestSanitizer function for a reference, it's all boilerplate.
  • Later in the same file there is a function called CreatePasses, in it check if your sanitizer is being run and if it is add your pass

Integrating a runtime component

  • In clang/lib/Driver/CommonArgs.cpp the driver calls collectSaniitzerRuntimes to decide which runtimes should be used. Add a check like the others to see if your sanitizer should be used, and if it is add it to the list of static runtimes.
  • This part is dependent on your operating system. in lib/Driver/Toolchains/Linux.cpp find the getSupportedSanitizers function and add your sanitizer to this list of the architectures are correct.

Some other things I learned

Your IR passes will be operating system agnostic but other parts of the toolchain are not. When integrating your sanitizer you will have to perform different build operations for OSX/Windows etc. Fortunately compiler-rt hides a lot of the nastiness from you. Try to use the sanitizer interface, it could save you from some headaches.

If you are having issues with some of the cmake build systems I would double check to see you didn't make any typos. For example if you put the architecture as ${x86}, it needs to be ${X86} etc etc.

Overall this internship was a great experience, and I hope that this repo documents what I learned so the rest of you can build sanitizers without needing to comb through the toolchain. Below are some of the helpful resources I linked in the blogpost, they are all really great.

Other notes

There is actually more that goes into sanitizer development that I didn't cover here. I think the best way to learn is to look at sanitizer pull requests and see what they modify and change.

Helpful resources

https://blog.trailofbits.com/2019/06/25/creating-an-llvm-sanitizer-from-hopes-and-dreams/

https://eli.thegreenplace.net/

https://www.cs.cornell.edu/~asampson/blog/llvm.html

https://llvm.org/docs/LangRef.html

https://llvm.org/devmtg/2018-04/

https://reviews.llvm.org/D32199

Maintainer

Carson Harmon [email protected] (@ThatsNotVeryCashMoneyOfYou)

More Repositories

1

algo

Set up a personal VPN in the cloud
Jinja
27,779
star
2

manticore

Symbolic execution tool
Python
3,536
star
3

graphtage

A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.
Python
2,354
star
4

ctf

CTF Field Guide
C
1,273
star
5

publications

Publications from Trail of Bits
Python
1,232
star
6

deepstate

A unit test-like interface for fuzzing and symbolic execution
Python
818
star
7

pe-parse

Principled, lightweight C/C++ PE parser
C++
691
star
8

eth-security-toolbox

A Docker container preconfigured with all of the Trail of Bits Ethereum security tools.
Dockerfile
670
star
9

maat

Open-source symbolic execution framework: https://maat.re
C++
612
star
10

twa

A tiny web auditor with strong opinions.
Shell
579
star
11

winchecksec

Checksec, but for Windows: static detection of security mitigations in executables
C++
523
star
12

polytracker

An LLVM-based instrumentation tool for universal taint tracking, dataflow analysis, and tracing.
C++
514
star
13

cb-multios

DARPA Challenges Sets for Linux, Windows, and macOS
C
498
star
14

multiplier

Code auditing productivity multiplier.
C++
434
star
15

onesixtyone

Fast SNMP Scanner
C
411
star
16

fickling

A Python pickling decompiler and static analyzer
Python
407
star
17

vast

VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or further program abstraction.
C++
381
star
18

tubertc

Peer-to-Peer Video Chat for Corporate LANs
JavaScript
361
star
19

krf

A kernelspace syscall interceptor and randomized faulter
C
348
star
20

polyfile

A pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer
Python
338
star
21

it-depends

A tool to automatically build a dependency graph and Software Bill of Materials (SBOM) for packages and arbitrary source code repositories.
Python
328
star
22

sinter

A user-mode application authorization system for MacOS written in Swift
Swift
301
star
23

SecureEnclaveCrypto

Demonstration library for using the Secure Enclave on iOS
Swift
276
star
24

protofuzz

Google Protocol Buffers message generator
Python
267
star
25

osquery-extensions

osquery extensions by Trail of Bits
C
262
star
26

dylint

A tool for running Rust lints from dynamic libraries
Rust
259
star
27

RpcInvestigator

Exploring RPC interfaces on Windows
C#
245
star
28

constexpr-everything

Rewrite C++ code to automatically apply `constexpr` where possible
C++
245
star
29

binjascripts

Scripts for Binary Ninja
Python
241
star
30

audit-kubernetes

k8s audit repo
Go
226
star
31

mishegos

A differential fuzzer for x86 decoders
C++
226
star
32

semgrep-rules

Semgrep queries developed by Trail of Bits.
Go
197
star
33

circomspect

A static analyzer and linter for the Circom zero-knowledge DSL
Rust
186
star
34

PrivacyRaven

Privacy Testing for Deep Learning
Python
183
star
35

siderophile

Find the ideal fuzz targets in a Rust codebase
Rust
171
star
36

flying-sandbox-monster

Sandboxed, Rust-based, Windows Defender Client
Rust
170
star
37

not-going-anywhere

A set of vulnerable Golang programs
Go
163
star
38

AppJailLauncher

CTF Challenge Framework for Windows 8 and above
C++
141
star
39

BTIGhidra

Binary Type Inference Ghidra Plugin
Java
138
star
40

uthenticode

A cross-platform library for verifying Authenticode signatures
C++
136
star
41

zkdocs

Interactive documentation on zero-knowledge proof systems and related primitives.
HTML
133
star
42

sienna-locomotive

A user-friendly fuzzing and crash triage tool for Windows
C++
132
star
43

Honeybee

An experimental high performance, fuzzing oriented Intel Processor Trace capture and analysis suite
C
127
star
44

ObjCGraphView

A graph view plugin for Binary Ninja to visualize Objective-C
Python
127
star
45

pasta

Peter's Amazing Syntax Tree Analyzer
C++
124
star
46

sqlite_wrapper

An easy-to-use, extensible and lightweight C++17 wrapper for SQLite
C++
117
star
47

ebpfpub

ebpfpub is a generic function tracing library for Linux that supports tracepoints, kprobes and uprobes.
C++
113
star
48

ctf-challenges

CTF Challenges
Python
112
star
49

binrec-tob

BinRec: Dynamic Binary Lifting and Recompilation
C++
110
star
50

appjaillauncher-rs

AppJailLauncher in Rust
Rust
103
star
51

vscode-weaudit

Create code bookmarks and code highlights with a click.
TypeScript
103
star
52

test-fuzz

To make fuzzing Rust easy
Rust
100
star
53

on-edge

A library for detecting certain improper uses of the "Defer, Panic, and Recover" pattern in Go programs
Go
97
star
54

ios-integrity-validator

Integrity validator for iOS devices
Shell
97
star
55

abi3audit

Scans Python packages for abi3 violations and inconsistencies
Python
97
star
56

ebpfault

A BPF-based syscall fault injector
C++
94
star
57

clang-cfi-showcase

Sample programs that illustrate how to use control flow integrity with the clang compiler
C++
92
star
58

awesome-ml-security

85
star
59

blight

A framework for instrumenting build tools
Python
83
star
60

ruzzy

A coverage-guided fuzzer for pure Ruby code and Ruby C extensions
Ruby
74
star
61

ManticoreUI

The Manticore User Interface with plugins for Binary Ninja and Ghidra
Python
73
star
62

bisc

Borrowed Instructions Synthetic Computation
Ruby
70
star
63

manticore-examples

Example Manticore scripts
Python
69
star
64

algo-ng

Experimental version of Algo built on Terraform
HCL
68
star
65

differ

Detecting Inconsistencies in Feature or Function Evaluations of Requirements
Python
67
star
66

deceptiveidn

Use computer vision to determine if an IDN can be interpreted as something it's not
Python
63
star
67

LeftoverLocalsRelease

The public release of LeftoverLocals code
C++
60
star
68

necessist

A tool for finding bugs in tests
Rust
59
star
69

reverie

An efficient and generalized implementation of the IKOS-style KKW proof system (https://eprint.iacr.org/2018/475) for arbitrary rings.
Rust
59
star
70

Codex-Decompiler

Python
57
star
71

testing-handbook

Trail of Bits Testing Handbook
C++
57
star
72

magnifier

C++
56
star
73

sixtyfour

How fast can we brute force a 64-bit comparison?
C
52
star
74

DomTreSat

Dominator Tree LLVM Pass to Test Satisfiability
C++
47
star
75

HVCI-loldrivers-check

PowerShell
45
star
76

nyc-infosec

Mapping the NYC Infosec Community
CSS
43
star
77

cfg-showcase

Sample programs that illustrate how to use Control Flow Guard, VS2015's control flow integrity implementation
C++
40
star
78

tsc_freq_khz

Linux kernel driver to export the TSC frequency via sysfs
C
40
star
79

rubysec

RubySec Field Guide
Ruby
40
star
80

macroni

C and C++ compiler frontend using PASTA to parse code, and VAST to represent the code as MLIR.
C
39
star
81

indurative

Easily create authenticated data structures
Haskell
37
star
82

http-security

Parse HTTP Security Headers
Ruby
36
star
83

trailofphish

Phishing e-mail repository
Ruby
36
star
84

KRFAnalysis

Collection of LLVM passes and triage tools for use with the KRF fuzzer
LLVM
35
star
85

ebpf-verifier

Harness for the Linux kernel eBPF verifier
C
32
star
86

ml-file-formats

List of ML file formats
31
star
87

umberto

poststructural fuzzing
Haskell
30
star
88

spf-query

Ruby SPF Parser
Ruby
29
star
89

ebpf-common

Various utilities useful for developers writing BPF tools
C++
29
star
90

clang-tidy-audit

Rewrite C/C++/Obj-C to Annotate Points of Interest
C++
27
star
91

eatmynetwork

A small script for running programs with (minimal) network sandboxing
Shell
26
star
92

btfparse

A C++ library that parses debug information encoded in BTF format
C++
25
star
93

anselm

Detect patterns of bad behavior in function calls
C++
25
star
94

dmarc

Ruby DMARC Parser
Ruby
25
star
95

linuxevents

A sample PoC for container-aware exec events for osquery
C++
23
star
96

mpc-learning

Perform multi-party computation on machine learning applications
Python
21
star
97

WinDbg-JS

JavaScript
21
star
98

go-mutexasserts

A small library that allows to check if Go mutexes are locked
Go
21
star
99

screen

Measure branching along code paths
C
20
star
100

itergator

CodeQL library and queries for iterator invalidation
CodeQL
19
star