clang-tutor
Example Clang plugins for C and C++ - based on Clang 16
clang-tutor is a collection of self-contained reference Clang plugins. It's a tutorial that targets novice and aspiring Clang developers. Key features:
- Modern - based on the latest version of Clang (and updated with every release)
- Complete - includes build scripts, LIT tests and CI set-up
- Out of tree - builds against a binary Clang installation (no need to build Clang from sources)
Corrections and feedback always welcome!
Overview
Clang (together with LibTooling) provides a very powerful API and infrastructure for analysing and modifying source files from the C language family. With Clang's plugin framework one can relatively easily create bespoke tools that aid development and improve productivity. The aim of clang-tutor is to showcase this framework through small, self-contained and testable examples, implemented using idiomatic LLVM.
This document explains how to set-up your environment, build and run the project, and go about debugging. The source files, apart from the code itself, contain comments that will guide you through the implementation. The tests highlight what edge cases are supported, so you may want to skim through them as well.
Table of Contents
HelloWorld
The HelloWorld plugin from HelloWorld.cpp is a self-contained reference example. The corresponding CMakeLists.txt implements the minimum set-up for an out-of-tree plugin.
HelloWorld extracts some interesting information from the input translation unit. It visits all C++ record declarations (more specifically class, struct and union declarations) and counts them. Recall that translation unit consists of the input source file and all the header files that it includes (directly or indirectly).
HelloWorld prints the results on a file by file basis, i.e. separately for every header file that has been included. It visits all declarations - including the ones in header files included by other header files. This may lead to some surprising results!
You can build and run HelloWorld like this:
# Build the plugin
export Clang_DIR=<installation/dir/of/clang/16>
export CLANG_TUTOR_DIR=<source/dir/clang/tutor>
mkdir build
cd build
cmake -DCT_Clang_INSTALL_DIR=$Clang_DIR $CLANG_TUTOR_DIR/HelloWorld/
make
# Run the plugin
$Clang_DIR/bin/clang -cc1 -load ./libHelloWorld.{so|dylib} -plugin hello-world $CLANG_TUTOR_DIR/test/HelloWorld-basic.cpp
You should see the following output:
# Expected output
(clang-tutor) file: <source/dir/clang/tutor>/test/HelloWorld-basic.cpp
(clang-tutor) count: 3
How To Analyze STL Headers
In order to see what happens with multiple indirectly included header files,
you can run HelloWorld on one of the header files from the Standard
Template Library. For
example, you can use the following C++ file that simply includes vector.h
:
// file.cpp
#include <vector>
When running a Clang plugin on a C++ file that includes headers from STL, it is
easier to run it with clang++
(rather than clang -cc1
) like this:
$Clang_DIR/bin/clang++ -c -Xclang -load -Xclang libHelloWorld.dylib -Xclang -plugin -Xclang hello-world file.cpp
This way you can be confident that all the necessary include paths (required to locate STL headers) are automatically added. For the above input file, HelloWorld will print:
- an overview of all header files included when using
#include <vector>
, and - the number of C++ records declared in each.
Note that there are no explicit declarations in file.cpp
and only one header
file is included. However, the output on my system consists of 37 header files
(one of which contains 371 declarations). Note that the actual output depends
on your host OS, the C++ standard library implementation and its version. Your
results are likely to be different.
Development Environment
Platform Support And Requirements
clang-tutor has been tested on Ubuntu 20.04 and Mac OS X 10.14.6. In order to build clang-tutor you will need:
- LLVM 16 and Clang 16
- C++ compiler that supports C++17
- CMake 3.13.4 or higher
As Clang is a subproject within llvm-project, it depends on LLVM (i.e. clang-tutor requires development packages for both Clang and LLVM).
There are additional requirements for tests (these will be satisfied by installing LLVM 16):
- lit (aka llvm-lit, LLVM tool for executing the tests)
- FileCheck (LIT requirement, it's used to check whether tests generate the expected output)
Installing Clang 16 On Mac OS X
On Darwin you can install Clang 16 and LLVM 16 with Homebrew:
brew install llvm
If you already have an older version of Clang and LLVM installed, you can upgrade it to Clang 16 and LLVM 16 like this:
brew upgrade llvm
Once the installation (or upgrade) is complete, all the required header files,
libraries and tools will be located in /usr/local/opt/llvm/
.
Installing Clang 16 On Ubuntu
On Ubuntu Jammy Jellyfish, you can install modern LLVM from the official repository:
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-16 main"
sudo apt-get update
sudo apt-get install -y llvm-16 llvm-16-dev libllvm16 llvm-16-tools clang-16 libclang-common-16-dev libclang-16-dev libmlir-16 libmlir-16-dev
This will install all the required header files, libraries and tools in
/usr/lib/llvm-16/
.
Building Clang 16 From Sources
Building from sources can be slow and tricky to debug. It is not necessary, but might be your preferred way of obtaining LLVM/Clang 16. The following steps will work on Linux and Mac OS X:
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout release/16.x
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi" <llvm-project/root/dir>/llvm/
cmake --build .
For more details read the official documentation.
Note for macOS users
As per this great
description
by Arthur O’Dwyer , add -DDEFAULT_SYSROOT="$(xcrun --show-sdk-path)"
to your
CMake invocation when building Clang from sources. Otherwise, clang
won't be
able to find e.g. standard C headers (e.g. wchar.h
).
Building & Testing
You can build clang-tutor (and all the provided plugins) as follows:
cd <build/dir>
cmake -DCT_Clang_INSTALL_DIR=<installation/dir/of/clang/16> <source/dir/clang-tutor>
make
The CT_Clang_INSTALL_DIR
variable should be set to the root of either the
installation or build directory of Clang 16. It is used to locate the
corresponding LLVMConfig.cmake
script that is used to set the include and
library paths.
In order to run the tests, you need to install llvm-lit (aka lit). It's not bundled with LLVM 16 packages, but you can install it with pip:
# Install lit - note that this installs lit globally
pip install lit
Running the tests is as simple as:
$ lit <build_dir>/test
Voilà! You should see all tests passing.
Overview of The Plugins
This table contains a summary of the examples available in clang-tutor. The Framework column refers to a plugin framework available in Clang that was used to implement the corresponding example. This is either RecursiveASTVisitor, ASTMatcher or both.
Name | Description | Framework |
---|---|---|
HelloWorld | counts the number of class, struct and union declarations in the input translation unit | RecursiveASTVisitor |
LACommenter | adds comments to literal arguments in functions calls | ASTMatcher |
CodeStyleChecker | issue a warning if the input file does not follow one of LLVM's coding style guidelines | RecursiveASTVisitor |
Obfuscator | obfuscates integer addition and subtraction | ASTMatcher |
UnusedForLoopVar | issue a warning if a for-loop variable is not used | RecursiveASTVisitor + ASTMatcher |
CodeRefactor | rename class/struct method names | ASTMatcher |
Once you've built this project, you can experiment with every plugin separately. All of them accept C and C++ files as input. Below you will find more detailed descriptions (except for HelloWorld, which is documented here).
LACommenter
The LACommenter (Literal Argument Commenter) plugin will comment literal arguments in function calls. For example, in the following input code:
extern void foo(int some_arg);
void bar() {
foo(123);
}
LACommenter will decorate the invocation of foo
as follows:
extern void foo(int some_arg);
void bar() {
foo(/*some_arg=*/123);
}
This commenting style follows LLVM's oficial guidelines. LACommenter will comment character, integer, floating point, boolean and string literal arguments.
This plugin is based on a similar example by Peter Smith presented here.
Run the plugin
You can test LACommenter on the example presented above. Assuming that it
was saved in input_file.c
, you can add comments to it as follows:
$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libLACommenter.dylib -plugin LAC input_file.cpp
ct-la-commenter
Run the plugin through locommenter is a standalone tool that will run the LACommenter plugin,
but without the need of using clang
and loading the plugin:
<build_dir>/bin/ct-la-commenter input_file.cpp --
If you don't append --
at the end of tools invocation will get the complain
from Clang tools about missing compilation database as follow:
Error while trying to load a compilation database:
Could not auto-detect compilation database for file "input_file.cpp"
No compilation database found in <source/dir/clang-tutor> or any parent directory
fixed-compilation-database: Error while opening fixed database: No such file or directory
json-compilation-database: Error while opening JSON database: No such file or directory
Running without flags.
Another workaround to solve the issue is to set the CMAKE_EXPORT_COMPILE_COMMANDS flag during the CMake invocation. It will give you the compilation database into your build directory with the filename as compile_commands.json. More detailed explaination about it can be found on Eli Bendersky's blog.
CodeStyleChecker
This plugin demonstrates how to use Clang's DiagnosticEngine to generate custom compiler warnings. Essentially, CodeStyleChecker checks whether names of classes, functions and variables in the input translation unit adhere to LLVM's style guide. If not, a warning is printed. For every warning, CodeStyleChecker generates a suggestion that would fix the corresponding issue. This is done with the FixItHint API. SourceLocation API is used to generate valid source location.
CodeStyleChecker is robust enough to cope with complex examples like
vector.h
from STL, yet the actual implementation is fairly compact. For
example, it can correctly analyze names expanded from macros and knows that it
should ignore user-defined conversion
operators.
Run the plugin
Let's test CodeStyleCheker on the following file:
// file.cpp
class clangTutor_BadName;
The name of the class doesn't follow LLVM's coding guide and CodeStyleChecker indeed captures that:
$Clang_DIR/bin/clang -cc1 -fcolor-diagnostics -load libCodeStyleChecker.dylib -plugin CSC file.cpp
file.cpp:2:7: warning: Type and variable names should start with upper-case letter
class clangTutor_BadName;
^~~~~~~~~~~~~~~~~~~
ClangTutor_BadName
file.cpp:2:17: warning: `_` in names is not allowed
class clangTutor_BadName;
~~~~~~~~~~^~~~~~~~~
clangTutorBadName
2 warnings generated.
There are two warnings generated as two rules have been violated. Alongside
every warning, a suggestion (i.e. a FixItHint
) that would make the
corresponding warning go away. Note that CodeStyleChecker also supplements
the warnings with correct source code information.
-fcolor-diagnostics
above instructs Clang to generate color output
(unfortunately Markdown doesn't render the colors here).
ct-code-style-checker
Run the plugin through ct-code-style-checker is a standalone tool that will run the CodeStyleChecker plugin,
but without the need of using clang
and loading the plugin:
<build_dir>/bin/ct-code-style-checker input_file.cpp --
Obfuscator
The Obfuscator plugin will rewrite integer addition and subtraction according to the following formulae:
a + b == (a ^ b) + 2 * (a & b)
a - b == (a + ~b) + 1
The above transformations are often used in code obfuscation. You may also know them from Hacker's Delight.
The plugin runs twice over the input file. First it scans for integer additions. If any are found, the input file is updated and printed to stdout. If there are no integer additions, there is no output. Similar logic is implemented for integer subtraction.
Similar code transformations are possible at the LLVM IR level. In particular, see MBAsub and MBAAdd in llvm-tutor.
Run the plugin
Lets use the following file as our input:
int foo(int a, int b) {
return a + b;
}
You can run the plugin like this:
$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libObfuscator.dylib -plugin Obfuscator input.cpp
You should see the following output on your screen.
int foo(int a, int b) {
return (a ^ b) + 2 * (a & b);
}
UnusedForLoopVar
This plugin detects unused for-loop variables (more specifically, the variables
defined inside the
traditional and
range-based for
loop statements) and issues a warning when one is found. For example, in
function foo
the loop variable j
is not used:
int foo(int var_a) {
for (int j = 0; j < 10; j++)
var_a++;
return var_a;
}
UnusedForLoopVar will warn you about it. Clearly the for loop in this case
can be replaced with var_a += 10;
, so UnusedForLoopVar does a great job
in drawing developer's attention to it. It can also detect unused loop
variables in range for loops, for example:
#include <vector>
int bar(std::vector<int> var_a) {
int var_b = 10;
for (auto some_integer: var_a)
var_b++;
return var_b;
}
In this case, some_integer
is not used and UnusedForLoopVar will
highlight it. The loop could be replaced with a much simpler expression: var_b += var_a.size();
.
Obviously unused loop variables may indicate an issue or a potential
optimisation (e.g. unroll the loop) or a simplification (e.g. replace the loop
with one arithmetic operation). However, that does not have to be the case and
sometimes we have good reasons not to use the loop variable.
If the name of a loop variable matches the [U|u][N|n][U|u][S|s][E|e][D|d]
then it will be ignored by"UnusedForLoopVar. For example, the following
modified version of the above example will not be reported:
int foo(int var_a) {
for (int unused = 0; unused < 10; unused++)
var_a++;
return var_a;
}
UnusedForLoopVar mixes both the ASTMatcher and RecursiveASTVisitor frameworks. It is an example of how to leverage both of them to solve a slightly more complex problem. The generated warnings are labelled so that you can see which framework was used to capture a particular case of an unused for-loop variable. For example, for the first example above you will get the following warning:
warning: (Recursive AST Visitor) regular for-loop variable not used
The second example leads to the following warning:
warning: (AST Matcher) range for-loop variable not used
Reading the source code should help you understand why different frameworks are needed in different cases. I have also added a few test files that you can use as reference examples (e.g. UnusedForLoopVar_regular_loop.cpp).
Run the plugin
$Clang_DIR/bin/clang -cc1 -fcolor-diagnostics -load <build_dir>/lib/libUnusedForLoopVar.dylib -plugin UFLV input.cpp
CodeRefactor
This plugin will rename a specified member method in a class (or a struct) and in all classes derived from it. It will also update all call sites in which the method is used so that the code remains semantically correct.
The following example contains all cases supported by CodeFefactor.
// file.cpp
struct Base {
virtual void foo() {};
};
struct Derived: public Base {
void foo() override {};
};
void StaticDispatch() {
Base B;
Derived D;
B.foo();
D.foo();
}
void DynamicDispatch() {
Base *B = new Base();
Derived *D = new Derived();
B->foo();
D->foo();
}
We will use CodeRefactor to rename Base::foo
as Base::bar
. Note that
this consists of two steps:
- update the declaration and the definition of
foo
in the base class (i.e.Base
) as well as all in the derived classes (i.e.Derived
) - update all call sites the use static dispatch (e.g.
B1.foo()
) and dynamic dispatch (e.g.B2->foo()
).
CodeRefactor will do all this refactoring for you! See below how to run it.
The implementation of CodeRefactor is rather straightforward, but it can only operate on one file at a time. clang-rename is much more powerful in this respect.
Run the plugin
CodeRefactor requires 3 command line arguments: -class-name
, -old-name
,
-new-name
. Hopefully these are self-explanatory. Passing the arguments to the
plugin is a bit cumbersome and probably best demonstrated with an example:
$Clang_DIR/bin/clang -cc1 -load <build_dir>/lib/libCodeRefactor.dylib -plugin CodeRefactor -plugin-arg-CodeRefactor -class-name -plugin-arg-CodeRefactor Base -plugin-arg-CodeRefactor -old-name -plugin-arg-CodeRefactor foo -plugin-arg-CodeRefactor -new-name -plugin-arg-CodeRefactor bar file.cpp
It is much easier when you the plugin through a stand-alone tool like
ct-code-refactor
!
ct-code-refactor
Run the plugin through ct-code-refactor
is a standalone tool that is basically a wrapper for
CodeRefactor. You can use it to refactor your input file as follows:
<build_dir>/bin/ct-code-refactor --class-name=Base --new-name=bar --old-name=foo file.cpp --
ct-code-refactor
uses LLVM's CommandLine
2.0 library for parsing command line
arguments. It is very well documented, relatively easy to integrate and the end
result is a very intuitive interface.
References
Below is a list of clang resources available outside the official online documentation that I have found very helpful.
- Resources inside Clang
- Refactoring tool template: clang-tools-extra/tool-template
- AST Matcher Reference
- Clang Tool Development
- Diagnostics
- "Emitting Diagnostics in Clang", Peter Goldsborough (blog post)
- Projects That Use Clang Plugins
- Mozilla: official documentation on static analysis in Firefox, custom ASTMatchers
- Chromium: official documentation on using clang plugins, in-tree source code
- LibreOffice: official documenation on developing Clang plugins, in-tree source code
- clang-query
License
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.
In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to http://unlicense.org/