• Stars
    star
    119
  • Rank 291,665 (Top 6 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A framework for reusing code in Clang tools

clang-metatool - A framework for reusing code in clang tools

Build Status

About clangmetatool

When we first started writing clang tools, we realized that there is a lot of life cycle management that we had to repeat. In some cases, people advocate the usage of global variables to manage the life-cycle of that data, but that makes code reuse across tools even harder.

Additionally, we also learned that when writing a tool, it will be beneficial if the code is split in two phases. First a data collection phase, and later a post-processing phase that actually performed the bulk of the logic of the tool.

Essentially you will only need to write a class like:

class MyTool {
private:
  SomeDataCollector collector1;
  SomeOtherDataCollector collector2;
public:
  MyTool(clang::CompilerInstance* ci, clang::ast_matchers::MatchFinder *f)
   :collector1(ci, f), collector2(ci, f) {
   // the individual collectors will register their callbacks in their
   // constructor, the tool doesn't really need to do anything else here.
  }
  void postProcessing
  (std::map<std::string, clang::tooling::Replacements> &replacementsMap) {
   // use data from collector1 and collector2
   // generate warnings and notices
   // add replacements to replacementsMap
  }
};

And then you can use the clangmetatool::MetaToolFactory combined with the clangmetatool::MetaTool in your tool's main function:

int main(int argc, const char* argv[]) {
  llvm::cl::OptionCategory MyToolCategory("my-tool options");
  llvm::cl::extrahelp CommonHelp
    (clang::tooling::CommonOptionsParser::HelpMessage);
  clang::tooling::CommonOptionsParser
    optionsParser(argc, argv, MyToolCategory);
  clang::tooling::RefactoringTool tool(optionsParser.getCompilations(),
                                       optionsParser.getSourcePathList());
  clangmetatool::MetaToolFactory< clangmetatool::MetaTool<MyTool> >
    raf(tool.getReplacements());
  int r = tool.runAndSave(&raf);
  return r;
}

One way in which our initial tools got hard to write and maintain was by trying to perform analysis or even replacements during the callbacks. It was not immediately obvious that this would lead to hard-to-maintain code. After we switched to the two-phase approach, we were able to reuse a lot more code across tools.

Fork me at github

Infrastructure

clangmetatool::MetaToolFactory

This provides the boilerplate for a refactoring tool action, since you need a factory that passes the replacementsMap in to the frontend action class.

clangmetatool::MetaTool

This provides the boilerplate of a FrontendAction class that will perform data gathering and then run a post-processing phase that may do replacements. This simplifies the writing of the code into a constructor that registers preprocessor callbacks or ast matchers and a postprocessing phase.

clangmetatool cmake module

When building a clang tool you are expected to ship the builtin headers from the compiler with the tool, otherwise the tool will fail to find headers like stdarg.h. Clang expects to find the builtin headers relative to the absolute path of where the tool is installed. This cmake module will provide a function called clangmetatool_install which will handle all of that for you, example at skeleton/CMakeLists.txt.

Reusable data types

This defines types that can be used as building blocks, those will be in the clangmetatool::types namespace.

Reusable data collection

Another part of this library consists of a number of "Data Collectors". Those will be in the clangmetatool::collectors namespace.

"Data Collector" is a "design pattern" for reusing code in clang tools. It works by having a class that takes the CompilerInstance object as well as the match finder to the constructor and registers all required callbacks in order to collect the data later.

The collector class will also have a "getData" method that will return the pointer to a struct with the data. The "getData" method should only be called in the 'post-processing' phase of the tool.

Constant Propagation

Another part of this consists of constant propagators to assist with analysis. Those will be in the clangmetatool::propagation namespace.

More specifically, the current implementation provides propagation for the follwing types so that variables may be queried for their true values anywhere within the control-flow, so long as the value is deterministic:

  • Constant C-style string propagator, which propagates constant strings through the control flow graph

  • Constant integer propagation which propagates integer values through the code considering references, pointers, const-ness of int & int-like types

This could be useful for various purposes but especially for identifing things like which database a function is actually calling out to, etc.

clangmetatool::propagation::ConstantCStringPropagator

This provides infrastructure (utilizing clangmetatool::propagation::ConstantPropagator and clangmetatool::propagation::PropagationVisitor) to propagate constant C-style string values over the program. Resulting in the true value of a variable wherever the value is deterministic and "" anywhere else.

clangmetatool::propagation::ConstantPropagator and clangmetatool::propagation::PropagationVisitor

These two classes provide the boilerplate to create infrastructure to propagate constants of arbitrary types through the control flow graph of the program in such a way that anywhere the constant value of a variable would be deterministic one may query its value at that point.

These classes are private to the library, but additional propagators could be easily made using these facilities.

Skeleton for a new project

After you "git init" into an empty directory, copy the contents of the skeleton directory. To build that project, do something like:

( mkdir -p build && cd build/ && \
  cmake \
  -DClang_DIR=/path/to/clang/ \
  -Dclangmetatool_DIR=/path/to/clang/ \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
  .. )
make -C build

Building

You need a full llvm+clang installation directory. Unfortunately, the Debian and Ubuntu packages are broken, so you may need to work-around by creating some symlinks (see .travis.Dockerfile in this repo for an example).

mkdir build
cd build
cmake -DClang_DIR=/path/to/clang/cmake ..
make
make install

License and Copyright

// ----------------------------------------------------------------------------
// Copyright 2018 Bloomberg Finance L.P.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ----------------------------- END-OF-FILE ----------------------------------

More Repositories

1

memray

Memray is a memory profiler for Python
Python
12,679
star
2

blazingmq

A modern high-performance open source message queuing system
C++
2,490
star
3

goldpinger

Debugging tool for Kubernetes which tests and displays connectivity between nodes in the cluster.
JavaScript
2,457
star
4

bde

Basic Development Environment - a set of foundational C++ libraries used at Bloomberg.
C++
1,542
star
5

comdb2

Bloomberg's distributed RDBMS
C
1,311
star
6

pystack

๐Ÿ” ๐Ÿ Like pstack but for Python!
Python
962
star
7

xcdiff

A tool which helps you diff xcodeproj files.
Swift
909
star
8

quantum

Powerful multi-threaded coroutine dispatcher and parallel execution engine
C++
567
star
9

ipydatagrid

Fast Datagrid widget for the Jupyter Notebook and JupyterLab
TypeScript
510
star
10

foml

Foundations of Machine Learning
Handlebars
330
star
11

pytest-memray

pytest plugin for easy integration of memray memory profiler
Python
318
star
12

python-github-webhook

A framework for writing webhooks for GitHub, in Python.
Python
276
star
13

chromium.bb

Chromium source code and modifications
267
star
14

koan

A word2vec negative sampling implementation with correct CBOW update.
C++
261
star
15

blpapi-node

Bloomberg Open API module for node.js
C++
243
star
16

chef-bcpc

Bloomberg Clustered Private Cloud distribution
Python
228
star
17

phabricator-tools

Phabricator Tools
Python
221
star
18

scatteract

Project which implements extraction of data from scatter plots
Jupyter Notebook
208
star
19

record-tuple-polyfill

A polyfill for the ECMAScript Record and Tuple proposal.
JavaScript
162
star
20

pasta-sourcemaps

Pretty (and) Accurate Stack Trace Analysis is an extension to the JavaScript source map format that allows for accurate function name decoding.
TypeScript
160
star
21

collectdwin

CollectdWin - a system statistics collection daemon for Windows, inspired by 'collectd'
C#
123
star
22

kubernetes-cluster-cookbook

Ruby
100
star
23

quant-research

A collection of projects published by Bloomberg's Quantitative Finance Research team.
Jupyter Notebook
98
star
24

blpapi-http

HTTP wrapper for Bloomberg Open API
TypeScript
83
star
25

dataless-model-merging

Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)
Python
74
star
26

amqpprox

An AMQP 0.9.1 proxy server, designed for use in front of an AMQP 0.9.1 compliant message queue broker such as RabbitMQ.
C++
72
star
27

spire-tpm-plugin

Provides agent and server plugins for SPIRE to allow TPM 2-based node attestation.
Go
71
star
28

bde-tools

Tools for developing and building libraries modeled on BDE
Perl
67
star
29

ntf-core

Sockets, timers, resolvers, events, reactors, proactors, and thread pools for asynchronous network programming
C++
67
star
30

repofactor

Tools for refactoring history of git repositories
Perl
63
star
31

chef-bach

Chef recipes for Bloomberg's deployment of Hadoop and related components
Ruby
61
star
32

minilmv2.bb

Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
Python
59
star
33

wsk

A straightforward and maintainable build system from the Bloomberg Graphics team.
JavaScript
58
star
34

git-adventure-game

An adventure game to help people learn Git
Shell
57
star
35

corokafka

C++ Kafka coroutine library using Quantum dispatcher and wrapping CppKafka
C++
50
star
36

attrs-strict

Provides runtime validation of attributes specified in Python 'attr'-based data classes.
Python
50
star
37

cnn-rnf

Convolutional Neural Networks with Recurrent Neural Filters
Python
49
star
38

rmqcpp

A batteries included C++ RabbitMQ Client Library/API.
C++
46
star
39

selekt

A Kotlin and familiar Android SQLite database library that uses encryption.
Kotlin
45
star
40

ppx_string_interpolation

PPX rewriter that enables string interpolation in OCaml
OCaml
44
star
41

bde_verify

Tool used to format, improve and verify code to BDE guidelines
C++
42
star
42

vault-auth-spire

vault-auth-spire is an authentication plugin for Hashicorp Vault which allows logging into Vault using a Spire provided SVID.
Go
41
star
43

spark-flow

Library for organizing batch processing pipelines in Apache Spark
Scala
41
star
44

startup-python-bootcamp

35
star
45

chef-umami

A tool to automatically generate test code for Chef cookbooks and policies.
Ruby
34
star
46

p1160

P1160 Add Test Polymorphic Memory Resource To Standard Library
C++
34
star
47

pycsvw

A tool to read CSV files with CSVW metadata and transform them into other formats.
Python
32
star
48

bde-allocator-benchmarks

A set of benchmarking tools used to quantify the performance of BDE-style polymorphic allocators.
C++
31
star
49

blpapi-hs

Haskell interface to BLPAPI
Haskell
30
star
50

bbit-learning-labs

Learning labs curated by BBIT
Python
28
star
51

rwl-bench

A set of benchmark tools for reader/writer locks.
C++
28
star
52

entsum

Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization
Jupyter Notebook
28
star
53

consul-cluster-cookbook

Wrapper cookbook which installs and configures a Consul cluster.
Ruby
26
star
54

kbir_keybart

Experimental code used in pre-training the KBIR and KeyBART models
Python
26
star
55

presto-accumulo

Presto Accumulo Integration
Java
25
star
56

sgtb

Structured Gradient Tree Boosting
Python
25
star
57

blazingmq-sdk-java

Java SDK for BlazingMQ, a modern high-performance open source message queuing system.
Java
24
star
58

python-comdb2

Python API to Bloomberg's comdb2 database.
Python
23
star
59

jupyterhub-kdcauthenticator

A Kerberos authenticator module for the JupyterHub platform
Python
22
star
60

docket

Tool to make running test suites easier, using docker-compose.
Go
22
star
61

blazingmq-sdk-python

Python SDK for BlazingMQ, a modern high-performance open source message queuing system.
Python
21
star
62

tzcron

A parser of cron-style scheduling expressions.
Python
20
star
63

constant.js

Immutable/Constant Objects for JavaScript
JavaScript
20
star
64

go-testgroup

Helps you organize tests in Go programs into groups.
Go
19
star
65

redis-cookbook

A set of Chef recipes for installing and configuring Redis.
HTML
19
star
66

userchroot

A tool to allow controlled access to 'chroot' functionality by users without root permissions
C
19
star
67

nginx-cookbook

A set of Chef recipes for installing and configuring Nginx.
Ruby
17
star
68

MixCE-acl2023

Implementation of MixCE method described in ACL 2023 paper by Zhang et al.
Python
17
star
69

zookeeper-cookbook

A set of Chef recipes for installing and configuring Apache Zookeeper.
Ruby
17
star
70

mynexttalk

16
star
71

chef-bcs

Bloomberg Cloud Storage Chef application
Ruby
16
star
72

vault-cluster-cookbook

Application cookbook which installs and configures Vault with Consul as a backend.
Ruby
15
star
73

git-adventure-game-builder

A set of tools for building a Git adventure game, to help people learn Git
Shell
15
star
74

emnlp20_depsrl

Research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing.
Python
14
star
75

coffeechat

A simple web application for arranging 'chats over coffee'.
TypeScript
12
star
76

k8eraid

A relatively simple, unified method for reporting on Kubernetes resource issues.
Go
12
star
77

hackathon-aws-cluster

HTML
11
star
78

fast-noise-aware-topic-clustering

Research code and scripts used in the Silburt et al. (2021) EMNLP 2021 paper 'FANATIC: FAst Noise-Aware TopIc Clustering'
Python
10
star
79

emnlp21_fewrel

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)
Python
10
star
80

mastering-difficult-conversations

Plan It, Say It, Nail It: Mastering Difficult Conversations
10
star
81

wsk-notify

Simple, customizable console notifications.
JavaScript
10
star
82

jenkins-cluster-cookbook

Ruby
9
star
83

decorator-taxonomy

A taxonomy of Python decorator types.
HTML
9
star
84

pytest-pystack

Pytest plugin that runs PyStack on slow or hanging tests.
Python
9
star
85

tdd-labs

Problems and Solutions for Test-Driven-Development training
JavaScript
9
star
86

argument-relation-transformer-acl2022

This repository contains code for our ACL 2022 Findings paper `Efficient Argument Structure Extraction with Transfer Learning and Active Learning`. We implement an argument structure extraction method based on a pre-trained Transformer model.`
Python
9
star
87

sigir2018-kg-contextualization

8
star
88

bloomberg.github.io

Source code for the https://bloomberg.github.io site
HTML
8
star
89

locking_resource-cookbook

Chef cookbook for serializing access to resources
Ruby
7
star
90

datalake-query-ingester

Python
7
star
91

cobbler-cookbook

A Chef cookbook for installing and maintaining Cobbler
Ruby
7
star
92

p2473

Example code for WG21 paper P2473
Perl
6
star
93

collectd-cookbook

Ruby
6
star
94

Catalyst-Authentication-Credential-GSSAPI

A module that provides integration of the Catalyst web application framework with GSSAPI/SPNEGO HTTP authentication.
Perl
6
star
95

bob-bot

Java
5
star
96

.github

Organization-wide community files
5
star
97

jenkins-procguard

Perl
5
star
98

datalake-query-db-consumer

Python
4
star
99

datalake-metrics-db

Python
3
star
100

collectd_plugins-cookbook

Ruby
3
star