• Stars
    star
    300
  • Rank 138,870 (Top 3 %)
  • Language
    Go
  • License
    BSD 3-Clause "New...
  • Created over 7 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Whole Program LLVM: wllvm ported to go

Whole Program LLVM in Go

License Build Status Go Report Card

TL; DR: A drop-in replacement for wllvm, that builds the bitcode in parallel, and is faster. A comparison between the two tools can be gleaned from building the Linux kernel.

Quick Start Comparison Table

wllvm command/env variable gllvm command/env variable
wllvm gclang
wllvm++ gclang++
wfortran gflang
extract-bc get-bc
wllvm-sanity-checker gsanity-check
LLVM_COMPILER_PATH LLVM_COMPILER_PATH
LLVM_CC_NAME ... LLVM_CC_NAME ...
LLVM_F_NAME
WLLVM_CONFIGURE_ONLY WLLVM_CONFIGURE_ONLY
WLLVM_OUTPUT_LEVEL WLLVM_OUTPUT_LEVEL
WLLVM_OUTPUT_FILE WLLVM_OUTPUT_FILE
LLVM_COMPILER not supported (clang only)
LLVM_GCC_PREFIX not supported (clang only)
LLVM_DRAGONEGG_PLUGIN not supported (clang only)
LLVM_LINK_FLAGS LLVM_LINK_FLAGS

This project, gllvm, provides tools for building whole-program (or whole-library) LLVM bitcode files from an unmodified C or C++ source package. It currently runs on *nix platforms such as Linux, FreeBSD, and Mac OS X. It is a Go port of wllvm.

gllvm provides compiler wrappers that work in two phases. The wrappers first invoke the compiler as normal. Then, for each object file, they call a bitcode compiler to produce LLVM bitcode. The wrappers then store the location of the generated bitcode file in a dedicated section of the object file. When object files are linked together, the contents of the dedicated sections are concatenated (so we don't lose the locations of any of the constituent bitcode files). After the build completes, one can use a gllvm utility to read the contents of the dedicated section and link all of the bitcode into a single whole-program bitcode file. This utility works for both executable and native libraries.

For more details see wllvm.

Prerequisites

To install gllvm you need the go language tool.

To use gllvm you need clang/clang++/flang and the llvm tools llvm-link and llvm-ar. gllvm is agnostic to the actual llvm version. gllvm also relies on standard build tools such as objcopy and ld.

Installation

To install, simply do (making sure to include those ...)

go get github.com/SRI-CSL/gllvm/cmd/...

This should install five binaries: gclang, gclang++, gflang, get-bc, and gsanity-check in the $GOPATH/bin directory.

If you are using go 1.16 you may be forced to install it like this:

GO111MODULE=off go get github.com/SRI-CSL/gllvm/cmd/...

Hopefully we will have a better fix for this soon?

Usage

gclang and gclang++ are the wrappers used to compile C and C++.
gflang is the wrapper used to compile Fortran. get-bc is used for extracting the bitcode from a build product (either an object file, executable, library or archive). gsanity-check can be used for detecting configuration errors.

Here is a simple example. Assuming that clang is in your PATH, you can build bitcode for pkg-config as follows:

tar xf pkg-config-0.26.tar.gz
cd pkg-config-0.26
CC=gclang ./configure
make

This should produce the executable pkg-config. To extract the bitcode:

get-bc pkg-config

which will produce the bitcode module pkg-config.bc. For more on this example see here.

Advanced Configuration

If clang and the llvm tools are not in your PATH, you will need to set some environment variables.

  • LLVM_COMPILER_PATH can be set to the absolute path of the directory that contains the compiler and the other LLVM tools to be used.

  • LLVM_CC_NAME can be set if your clang compiler is not called clang but something like clang-3.7. Similarly LLVM_CXX_NAME and LLVM_F_NAME can be used to describe what the C++ and Fortran compilers are called, respectively. We also pay attention to the environment variables LLVM_LINK_NAME and LLVM_AR_NAME in an analogous way.

Another useful, and sometimes necessary, environment variable is WLLVM_CONFIGURE_ONLY.

  • WLLVM_CONFIGURE_ONLY can be set to anything. If it is set, gclang and gclang++ behave like a normal C or C++ compiler. They do not produce bitcode. Setting WLLVM_CONFIGURE_ONLY may prevent configuration errors caused by the unexpected production of hidden bitcode files. It is sometimes required when configuring a build. For example:
    WLLVM_CONFIGURE_ONLY=1 CC=gclang ./configure
    make
    

Extracting the Bitcode

The get-bc tool is used to extract the bitcode from a build artifact, such as an executable, object file, thin archive, archive, or library. In the simplest use case, as seen above, one simply does:

get-bc -o <name of bitcode file> <path to executable>

This will produce the desired bitcode file. The situation is similar for an object file. For an archive or library, there is a choice as to whether you produce a bitcode module or a bitcode archive. This choice is made by using the -b switch.

Another useful switch is the -m switch which will, in addition to producing the bitcode, will also produce a manifest of the bitcode files that made up the final product. As is typical

get-bc -h

will list all the commandline switches. Since we use the golang flag module, the switches must precede the artifact path.

Preserving bitcode files in a store

Sometimes, because of pathological build systems, it can be useful to preserve the bitcode files produced in a build, either to prevent deletion or to retrieve it later. If the environment variable WLLVM_BC_STORE is set to the absolute path of an existing directory, then WLLVM will copy the produced bitcode file into that directory. The name of the copied bitcode file is the hash of the path to the original bitcode file. For convenience, when using both the manifest feature of get-bc and the store, the manifest will contain both the original path, and the store path.

Debugging

The gllvm tools can show various levels of output to aid with debugging. To show this output set the WLLVM_OUTPUT_LEVEL environment variable to one of the following levels:

  • ERROR
  • WARNING
  • AUDIT
  • INFO
  • DEBUG

For example:

    export WLLVM_OUTPUT_LEVEL=DEBUG

Output will be directed to the standard error stream, unless you specify the path of a logfile via the WLLVM_OUTPUT_FILE environment variable. The AUDIT level, new in 2022, logs only the calls to the compiler, and indicates whether each call is compiling or linking, the compiler used, and the arguments provided.

For example:

    export WLLVM_OUTPUT_FILE=/tmp/gllvm.log

Dragons Begone

gllvm does not support the dragonegg plugin.

Sanity Checking

Too many environment variables? Try doing a sanity check:

gsanity-check

it might point out what is wrong.

Under the hoods

Both wllvm and gllvm toolsets do much the same thing, but the way they do it is slightly different. The gllvm toolset's code base is written in golang, and is largely derived from the wllvm's python codebase.

Both generate object files and bitcode files using the compiler. wllvm can use gcc and dragonegg, gllvm can only use clang. The gllvm toolset does these two tasks in parallel, while wllvm does them sequentially. This together with the slowness of python's fork exec-ing, and it's interpreted nature accounts for the large efficiency gap between the two toolsets.

Both inject the path of the bitcode version of the .o file into a dedicated segment of the .o file itself. This segment is the same across toolsets, so extracting the bitcode can be done by the appropriate tool in either toolset. On *nix both toolsets use objcopy to add the segment, while on OS X they use ld.

When the object files are linked into the resulting library or executable, the bitcode path segments are appended, so the resulting binary contains the paths of all the bitcode files that constitute the binary. To extract the sections the gllvm toolset uses the golang packages "debug/elf" and "debug/macho", while the wllvm toolset uses objdump on *nix, and otool on OS X.

Both tools then use llvm-link or llvm-ar to combine the bitcode files into the desired form.

Customization under the hood.

You can specify the exact version of objcopy and ld that gllvm uses to manipulate the artifacts by setting the GLLVM_OBJCOPY and GLLVM_LD environment variables. For more details of what's under the gllvm hood, try

gsanity-check -e

Customizing the BitCode Generation (e.g. LTO)

In some situations it is desirable to pass certain flags to clang in the step that produces the bitcode. This can be fulfilled by setting the LLVM_BITCODE_GENERATION_FLAGS environment variable to the desired flags, for example "-flto -fwhole-program-vtables".

In other situations it is desirable to pass certain flags to llvm-link in the step that merges multiple individual bitcode files together (i.e., within get-bc). This can be fulfilled by setting the LLVM_LINK_FLAGS environment variable to the desired flags, for example "-internalize -only-needed".

Beware of link time optimization.

If the package you are building happens to take advantage of recent clang developments such as link time optimization (indicated by the presence of compiler flag -flto), then your build is unlikely to produce anything that get-bc will work on. This is to be expected. When working under these flags, the compiler actually produces object files that are bitcode, your only recourse here is to try and save these object files, and retrieve them yourself. This can be done by setting the LTO_LINKING_FLAGS to be something like "-g -Wl,-plugin-opt=save-temps" which will be appended to the flags at link time. This will at least preserve the bitcode files, even if get-bc will not be able to retrieve them for you.

Developer tools

Debugging usually boils down to looking in the logs, maybe adding a print statement or two. There is an additional executable, not mentioned above, called gparse that gets installed along with gclang, gclang++, gflang, get-bc and gsanity-check. gparse takes the command line arguments to the compiler, and outputs how it parsed them. This can sometimes be helpful.

License

gllvm is released under a BSD license. See the file LICENSE for details.


This material is based upon work supported by the National Science Foundation under Grant ACI-1440800. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

More Repositories

1

yices2

The Yices SMT Solver
SMT
370
star
2

PVS

The People's Verification System
Common Lisp
135
star
3

stegotorus

A Camouflage Proxy for the Tor Anonymity System
C++
77
star
4

sally

A model checker for infinite-state systems.
C++
68
star
5

Maude

Language based on Rewriting Logic
C++
65
star
6

llvm2smt

Experimental translation of llvm to smt.
LLVM
57
star
7

libpoly

LibPoly is a C library for manipulating polynomials
C
46
star
8

l3riscv

An executable specification of the RISCV ISA in L3.
Ruby
41
star
9

TrinityMultimodalTrojAI

Python
33
star
10

NTT

An Implementation of the Number Theoretic Transform
C
33
star
11

Bliss

BLISS: Bimodal Lattice Signature Schemes
C
25
star
12

musllvm

The start of a port of musl libc to an "x86_64 llvm bitcode" architecture.
C
24
star
13

Trinity-TrojAI

This repository contains code developed by the SRI team for the IARPA/TrojAI program.
Python
19
star
14

jumpbox

JumpBox – A Seamless Browser Proxy for Tor Pluggable Transports
C
15
star
15

bixie

Inconsistent code detection for Java.
Java
14
star
16

filia

Translate Python and JavaScript into MLIR
C++
14
star
17

AircraftVerse

Jupyter Notebook
13
star
18

ENCODERS

ENCODERS (Edge Networking with Content-Oriented Declarative Enhanced Routing and Storage) is SRI’s content-based networking solution that provides network services and transport architectures required for efficient, transparent distribution of content in mobile ad hoc networks.
C
12
star
19

yices2_ocaml_bindings

OCaml
11
star
20

high-assurance-crypto

This repository contains software for projects focusing on computer-aided verification of (distributed) cryptographic protocols and algorithms.
eC
10
star
21

Wholly

Wholly!
Python
9
star
22

yices2_python_bindings

Python bindings for yices2
Python
9
star
23

f3d

f3d, a.k.a. FREEDIUS, a.k.a. the Cartographic Modeling Environment, a.k.a. the Image Understanding Environment. Lisp-based geospatial image analysis.
Common Lisp
8
star
24

arsenal-base

Python
8
star
25

imaude

Interactive Maude
NewLisp
8
star
26

homebrew-sri-csl

SRI International's Tap
Ruby
7
star
27

radler

Radler
Python
7
star
28

secure_ros

Secure ROS
Python
6
star
29

jel

JPEG Embedding Library
C
6
star
30

ALICE

Python
6
star
31

SMT.tmbundle

Linguist/TextMate support for SMT-LIB2
6
star
32

ETB

The Evidential Tool Bus
Python
6
star
33

do-like-javac

Python
6
star
34

pascali-public

Public PASCALI repo
Python
5
star
35

parsley-lang

Parsley format definition language
OCaml
5
star
36

yices2_java_bindings

Java bindings for Yices 2.
C++
4
star
37

VCPublic

Place to share snapshots of maude models
Python
4
star
38

signal-public

Public SIGNAL repo
Jupyter Notebook
4
star
39

WhollyRecipes

Recipes for the Wholly build system
Python
4
star
40

OCCAM-Benchmarks

Set of benchmarks used by the OCCAM tool.
Python
4
star
41

ICS

Integrated Canonizer and Solver
OCaml
3
star
42

PVSCodegen

3
star
43

iopc

The C infrastructure for the IOP system
C
3
star
44

DroneSim

SoftAgents drone simulation example
Python
3
star
45

pce

Probabilistic Consistency Engine
C
3
star
46

clam-prov

Provenance Tracking with Clam
C++
3
star
47

io-specialization

Specialization of IO system calls
C++
2
star
48

PVSPackrat

PVS proofs for PEG grammars and Packrat parsers.
2
star
49

PLambda

A Python version of JLambda
Python
2
star
50

sri-glibc-malloc

SRI's modification of glibc malloc that eliminates metadata in client memory.
C
2
star
51

datum

A parser for biological experiment shorthand.
Clojure
2
star
52

prism

PRISM is the stand-alone version of our SRI TA1 system developed under the DARPA RACE program during 2019-2023. This software was cleared by DARPA on September 18, 2023; Approved for Public Release, Distribution Unlimited (Distribution "A").
Python
2
star
53

SudokuSolver

A sudoku solver to illustrate the new yices python API
Python
2
star
54

latextrack

LaTeX Track Changes shows changes over time for a .tex file that has its history stored in a git or svn repository. The user can customize how to view the changes: limited to certain authors or by revision or date among other filters. An Emacs mode provides the user interface. Plug-ins for other editors (such as TeXShop or Atom) are planned.
Java
2
star
55

libfutil

The _F_unctions and _UTIL_ities library
C
2
star
56

parsley-rust

Rust infrastructure for Parsley parsing
Rust
1
star
57

HybridSal

Java
1
star
58

nodelet_core

nodelet_core is a forked version of nodelet_core for Secure ROS.
1
star
59

WrapPat

1
star
60

libpoly_ocaml_bindings

OCaml bindings for libpoly
OCaml
1
star
61

lingoboost

Project for LingoBoost "TMR Lite" Android App
Kotlin
1
star
62

fomoh

PyTorch-based library that implements nested forward AD and interfaces with PyTorch models.
Jupyter Notebook
1
star
63

evocrypt

EVOCrypt: EasyCrypt Verified OCaml Cryptographic Library
eC
1
star
64

safedocs-recognizer

DARPA SafeDocs TA1 software suite to bundle and orchestrate various format-aware tracing tools.
Python
1
star
65

tree

C
1
star
66

augmented-metitarski

An Augmented MetiTarski Dataset for Real Quantifier Elimination using Machine Learning
1
star
67

rendezvous

Abandonware or Demoware code base for rendezvous.
C
1
star
68

ACS

Address Change Signaling
Go
1
star
69

_ros_comm

ros_comm is a forked version of ros_comm for Secure ROS.
C++
1
star
70

SoftAgentsDiagnosis

Python
1
star
71

secure_ros_tools

Python
1
star
72

phosphosite

A web trawler for biological data
Python
1
star
73

clam-prov-benchmarks

Benchmarks for clam-prov
Shell
1
star
74

BlissResources

A collection of public resources related to BLISS (Bimodal Lattice Signature Schemes)
C
1
star
75

safedocs-yarn-public

Python
1
star
76

Trinity

Trinity AI for Improving Trustworthiness, Resilience and Interpretability of AI
Python
1
star
77

dnre

Code accompanying paper: Direct Amortized Likelihood Ratio Estimation
Jupyter Notebook
1
star
78

TIJO

Official Implementation of ICCV'23 paper on Multimodal Backdoor Defense Technique: TIJO (Trigger Inversion using Joint Optimization)
1
star
79

OpensDec

The Open Source Decompiler Project
1
star
80

coproof

Jupyter Notebook
1
star