• Stars
    star
    1,857
  • Rank 24,974 (Top 0.5 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli

weggli example

Introduction

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

weggli performs pattern matching on Abstract Syntax Trees based on user provided queries. Its query language resembles C and C++ code, making it easy to turn interesting code patterns into queries.

weggli is inspired by great tools like Semgrep, Coccinelle, joern and CodeQL, but makes some different design decisions:

  • C++ support: weggli has first class support for modern C++ constructs, such as lambda expressions, range-based for loops and constexprs.

  • Minimal setup: weggli should work out-of-the box against most software you will encounter. weggli does not require the ability to build the software and can work with incomplete sources or missing dependencies.

  • Interactive: weggli is designed for interactive usage and fast query performance. Most of the time, a weggli query will be faster than a grep search. The goal is to enable an interactive workflow where quick switching between code review and query creation/improvement is possible.

  • Greedy: weggli's pattern matching is designed to find as many (useful) matches as possible for a specific query. While this increases the risk of false positives it simplifies query creation. For example, the query $x = 10; will match both assignment expressions (foo = 10;) and declarations (int bar = 10;).

Usage

Use -h for short descriptions and --help for more details.

 Homepage: https://github.com/googleprojectzero/weggli

 USAGE: weggli [OPTIONS] <PATTERN> <PATH>

 ARGS:
     <PATTERN>
            A weggli search pattern. weggli's query language closely resembles
             C and C++ with a small number of extra features.

             For example, the pattern '{_ $buf[_]; memcpy($buf,_,_);}' will
             find all calls to memcpy that directly write into a stack buffer.

             Besides normal C and C++ constructs, weggli's query language
             supports the following features:

             _        Wildcard. Will match on any AST node.

             $var     Variables. Can be used to write queries that are independent
                      of identifiers. Variables match on identifiers, types,
                      field names or namespaces. The --unique option
                      optionally enforces that $x != $y != $z. The --regex option can
                      enforce that the variable has to match (or not match) a
                      regular expression.

             _(..)    Subexpressions. The _(..) wildcard matches on arbitrary
                      sub expressions. This can be helpful if you are looking for some
                      operation involving a variable, but don't know more about it.
                      For example, _(test) will match on expressions like test+10,
                      buf[test->size] or f(g(&test));

             not:     Negative sub queries. Only show results that do not match the
                      following sub query. For example, '{not: $fv==NULL; not: $fv!=NULL *$v;}'
                      would find pointer dereferences that are not preceded by a NULL check.

            strict:   Enable stricter matching. This turns off statement unwrapping 
                      and greedy function name matching. For example 'strict: func();' 
                      will not match on 'if (func() == 1)..' or 'a->func()' anymore.

             weggli automatically unwraps expression statements in the query source
             to search for the inner expression instead. This means that the query `{func($x);}`
             will match on `func(a);`, but also on `if (func(a)) {..}` or  `return func(a)`.
             Matching on `func(a)` will also match on `func(a,b,c)` or `func(z,a)`.
             Similarly, `void func($t $param)` will also match function definitions
             with multiple parameters.

             Additional patterns can be specified using the --pattern (-p) option. This makes
             it possible to search across functions or type definitions.

    <PATH>
            Input directory or file to search. By default, weggli will search inside
             .c and .h files for the default C mode or .cc, .cpp, .cxx, .h and .hpp files when
             executing in C++ mode (using the --cpp option).
             Alternative file endings can be specified using the --extensions (-e) option.

             When combining weggli with other tools or preprocessing steps,
             files can also be specified via STDIN by setting the directory to '-'
             and piping a list of filenames.


 OPTIONS:
     -A, --after <after>
            Lines to print after a match. Default = 5.

    -B, --before <before>
            Lines to print before a match. Default = 5.

    -C, --color
            Force enable color output.

    -X, --cpp
            Enable C++ mode.

        --exclude <exclude>...
            Exclude files that match the given regex.

    -e, --extensions <extensions>...
            File extensions to include in the search.

    -f, --force
            Force a search even if the queries contains syntax errors.

    -h, --help
            Prints help information.

        --include <include>...
            Only search files that match the given regex.

    -l, --limit
            Only show the first match in each function.

    -p, --pattern <p>...
            Specify additional search patterns.

    -R, --regex <regex>...
            Filter variable matches based on a regular expression.
             This feature uses the Rust regex crate, so most Perl-style
             regular expression features are supported.
             (see https://docs.rs/regex/1.5.4/regex/#syntax)

             Examples:

             Find calls to functions starting with the string 'mem':
             weggli -R 'func=^mem' '$func(_);'

             Find memcpy calls where the last argument is NOT named 'size':
             weggli -R 's!=^size$' 'memcpy(_,_,$s);'

    -u, --unique
            Enforce uniqueness of variable matches.
             By default, two variables such as $a and $b can match on identical values.
             For example, the query '$x=malloc($a); memcpy($x, _, $b);' would
             match on both

             void *buf = malloc(size);
             memcpy(buf, src, size);

             and

             void *buf = malloc(some_constant);
             memcpy(buf, src, size);

             Using the unique flag would filter out the first match as $a==$b.

    -v, --verbose
            Sets the level of verbosity.

    -V, --version
            Prints version information.

Examples

Calls to memcpy that write into a stack-buffer:

weggli '{
    _ $buf[_];
    memcpy($buf,_,_);
}' ./target/src

Calls to foo that don't check the return value:

weggli '{
   strict: foo(_);
}' ./target/src

Potentially vulnerable snprintf() users:

weggli '{
    $ret = snprintf($b,_,_);
    $b[$ret] = _;
}' ./target/src

Potentially uninitialized pointers:

weggli '{ _* $p;
NOT: $p = _;
$func(&$p);
}' ./target/src

Potentially insecure WeakPtr usage:

weggli --cpp '{
$x = _.GetWeakPtr(); 
DCHECK($x); 
$x->_;}' ./target/src

Debug only iterator validation:

weggli -X 'DCHECK(_!=_.end());' ./target/src

Functions that perform writes into a stack-buffer based on a function argument.

weggli '_ $fn(_ $limit) {
    _ $buf[_];
    for (_; $i<$limit; _) {
        $buf[$i]=_;
    }
}' ./target/src

Functions with the string decode in their name

weggli -R func=decode '_ $func(_) {_;}'

Encoding/Conversion functions

weggli '_ $func($t *$input, $t2 *$output) {
    for (_($i);_;_) {
        $input[$i]=_($output);
    }
}' ./target/src

Install

$ cargo install weggli

Build Instruction

# optional: install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh 

git clone https://github.com/googleprojectzero/weggli.git
cd weggli; cargo build --release
./target/release/weggli

Implementation details

Weggli is built on top of the tree-sitter parsing library and its C and C++ grammars. Search queries are first parsed using an extended version of the corresponding grammar, and the resulting AST is transformed into a set of tree-sitter queries in builder.rs. The actual query matching is implemented in query.rs, which is a relatively small wrapper around tree-sitter's query engine to add weggli specific features.

Contributing

See CONTRIBUTING.md for details.

License

Apache 2.0; see LICENSE for details.

Disclaimer

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

More Repositories

1

winafl

A fork of AFL for fuzzing Windows binaries
C
2,311
star
2

sandbox-attacksurface-analysis-tools

Set of tools to analyze Windows sandboxes for exposed attack surface.
C#
2,047
star
3

fuzzilli

A JavaScript Engine Fuzzer
Swift
1,859
star
4

domato

DOM fuzzer
Python
1,672
star
5

TinyInst

A lightweight dynamic instrumentation library
C++
1,158
star
6

Jackalope

Binary, coverage-guided fuzzer for Windows, macOS, Linux and Android
C++
1,068
star
7

halfempty

A fast, parallel test case minimization tool.
C
941
star
8

0days-in-the-wild

Repository for information about 0-days exploited in-the-wild.
HTML
753
star
9

symboliclink-testing-tools

C++
747
star
10

p0tools

Project Zero Docs and Tools
C++
698
star
11

ktrw

An iOS kernel debugger based on a KTRR bypass for A11 iPhones; works with LLDB and IDA Pro.
C
660
star
12

functionsimsearch

Some C++ example code to demonstrate how to perform code similarity searches using SimHashing.
C++
559
star
13

BrokenType

TrueType and OpenType font fuzzing toolset
C++
430
star
14

iOS-messaging-tools

Python
368
star
15

SockFuzzer

C
367
star
16

SkCodecFuzzer

Fuzzing harness for testing proprietary image codecs supported by Skia on Android
C++
331
star
17

bochspwn

A Bochs-based instrumentation project designed to log kernel memory references, to identify "double fetches" and other OS vulnerabilities
C++
319
star
18

bochspwn-reloaded

A Bochs-based instrumentation performing kernel memory taint tracking to detect disclosure of uninitialized memory to ring 3
C++
284
star
19

Street-Party

Street Party is a suite of tools that allows the RTP streams of video conferencing implementations to be viewed and modified.
C++
242
star
20

DrSancov

DynamoRIO plugin to get ASAN and SanitizerCoverage compatible output for closed-source executables
C++
203
star
21

CompareCoverage

Clang instrumentation module for tracing variable and buffer comparisons in C/C++ and saving the coverage data to .sancov files
C++
200
star
22

Hyntrospect

PowerShell
179
star
23

reil

C++
59
star
24

.allstar

1
star
25

.github

1
star