• Stars
    star
    257
  • Rank 153,258 (Top 4 %)
  • Language
    C++
  • License
    Other
  • Created over 11 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

experimental and very fast implementation of a grep

grab - simple, but very fast grep

greppin

This is my own, experimental, parallel version of grep so I can test various strategies to speed up access to large directory trees. On Flash storage or SSDs, you can easily outsmart common greps by up a factor of 8.

Options:

Usage: ./greppin [-rIOLlsSH] [-n <cores>] <regex> <path>

	-2	-- use PCRE2 instead of PCRE
	-O	-- print file offset of match
	-l	-- do not print the matching line (Useful if you want
		   to see _all_ offsets; if you also print the line, only
		   the first match in the line counts)
	-s	-- single match; dont search file further after first match
		   (similar to grep on a binary)
	-H	-- use hyperscan lib for scanning
	-S	-- only for hyperscan: interpret pattern as string literal instead of regex
	-L	-- machine has low mem; half chunk-size (default 2GB)
		   may be used multiple times
	-I	-- enable highlighting of matches (useful)
	-n	-- Use multiple cores in parallel (omit for single core)
	-r	-- recurse on directory

grab uses the pcre library, so basically its equivalent to a grep -P -a. The -P is important, since Perl-Compatible Regular Expressions have different characteristics than basic regexes.

Build

There are two branches. master and greppin. Master is the 'traditional' grab that should compile and run on most POSIX systems. greppin comes with its own optimized and parallelized version of nftw() and readdir(), which again doubles speed on the top of speedup that the master branch already provides. The greppin branch runs on Linux, BSD and OSX. greppin also comes with support for Intel's hyperscan libraries that try to exploit CPU's SIMD instructions if possible (AVX2, AVX512 etc.) when compiling the regex pattern into JIT code.

You will most likely want to build the greppin branch:

$ git checkout greppin
[...]
$ cd src; make
[...]

Make sure you have the pcre and pcre2 library packages installed. On BSD systems you need gmake instead of make. If you want to do cutting edge tech with greppin's multiple regex engine and hyperscan support, you first need to get and build that:

$ git clone https://github.com/intel/hyperscan
[...]
$ cd hyperscan
$ mkdir build; cd build
$ cmake -DFAT_RUNTIME=1 -DBUILD_STATIC_AND_SHARED=1 ..
[...]
$ make
[...]

This will build so called fat runtime of the hyperscan libs which contain support for all CPU families in order to select the right compilation pattern at runtime for most performance. Once the build finishes, you build greppin against that:

(inside grab cloned repo)

$ cd src
$ HYPERSCAN_BUILD=/path/to/hyperscan/build make -f Makefile.hs
[...]

This will produce a greppin binary that enables the -H option to load a different engine at runtime, trying to exploit all possible performance bits.

You could link it against already installed libs, but the API just recently added some functions in the 5.x version and most distros ship with 4.x.

Why is it faster?

grab is using mmap(2) and matches the whole file blob without counting newlines (which grep is doing even if there is no match [as of a grep code review of mine in 2012; things may be different today]) which is a lot faster than read(2)-ing the file in small chunks and counting the newlines. If available, grab also uses the PCRE JIT feature. However, speedups are only measurable on large file trees or fast HDDs or SSDs. In the later case, the speedup can be really drastically (up to 3 times faster) if matching recursively and in parallel. Since storage is the bottleneck, parallelizing the search on HDDs makes no sense, as the seeking takes more time than just doing stuff in linear.

Additionally, grab is skipping files which are too small to contain the regular expression. For larger regex's in a recursive search, this can skip quite good amount of files without even opening them.

A quite new pcre lib is required, on some older systems the build can fail due to a missing PCRE_INFO_MINLENGTH and pcre_study().

Files are mmaped and matched in chunks of 1Gig. For files which are larger, the last 4096 byte (1 page) of a chunk are overlapped, so that matches on a 1 Gig boundary can be found. In this case, you see the match doubled (but with the same offset).

If you measure grep vs. grab, keep in mind to drop the dentry and page caches between each run: echo 3 > /proc/sys/vm/drop_caches

Note, that grep will print only a 'Binary file matches', if it detects binary files, while grab will print all matches, unless -s is given. So, for a speed test you have to search for an expression that does not exist in the data, in order to enforce searching of the entire files.

grab was made to quickly grep through large directory trees without indexing. The original grep has by far a more complete option-set. The speedup for a single file match is very small, if at all measureable.

For SSDs, the multicore option makes sense. For HDDs it does not, since the head has to be positioned back and forth between the threads, potentially destroying the locality principle and killing performance.

The greppin branch features its own lockfree parallel version of nftw(), so the time of idling of N - 1 cores when the 1st core builds the directory tree can also be used for working.

Whats left to note: grab will traverse directories physically, i.e. it will not follow symlinks.

spot

spot is the parallel version of find. It supports the most frequently used options as you know it. Theres not much more to tell about it, just try it out.

Examples

This shows the speedup on a 4-core machine with a search on a SSD:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time grep -r foobardoesnotexist /source/linux

real	0m34.811s
user	0m3.710s
sys	0m10.936s
root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time grab -r foobardoesnotexist /source/linux

real	0m31.629s
user	0m4.984s
sys	0m8.690s
root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time grab -n 2 -r foobardoesnotexist /source/linux

real	0m15.203s
user	0m3.689s
sys	0m4.665s
root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time grab -n 4 -r foobardoesnotexist /source/linux

real	0m13.135s
user	0m4.023s
sys	0m5.581s

With greppin branch:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time grep -a -P -r linus /source/linux/|wc -l
16918

real    1m12.470s
user    0m49.548s
sys     0m6.162s
root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# time greppin -n 4 -r linus /source/linux/|wc -l
16918

real    0m8.773s
user    0m4.670s
sys     0m5.837s
root@linux:~#

Yes! ~ 9s vs. ~ 72s! Thats 8x as fast on a 4-core SSD machine as the traditional grep.

Just to proof that it resulted in the same output:

root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# greppin -n 4 -r linus /source/linux/|sort|md5sum
a1f9fe635bd22575a4cce851e79d26a0  -
root@linux:~# echo 3 > /proc/sys/vm/drop_caches
root@linux:~# grep -P -a -r linus /source/linux/|sort|md5sum
a1f9fe635bd22575a4cce851e79d26a0  -
root@linux:~#

In the single core comparison, speedup also depends on which CPU the kernel actually scheduls the grep, so a grab may or may not be faster (mostly it is). If the load is equal among the single-core tests, grab will see a speedup if searching on large file trees. On multi-core setups, grab can benefit ofcorse.

ripgrep comparison

The project can be found here.

The main speedup thats inside their benchmark tables stems from the fact that ripgrep ignores a lot of files (notably dotfiles) when invoked without special options as well as treating binary files as a single-match target (similar to grep). In order to have comparable results, keep in mind to (4 is the number of cores):

  • echo 3 > /proc/sys/vm/drop_caches between each run
  • Add -j 4 -a --no-unicode --no-pcre2-unicode -uuu --mmap to ripgrep, since it will by default match Unicode which is 3 times slower, and tries to compensate the speedloss by skipping 'ignore'-based files. -e is faster than -P, so better choose -e, but thats not as powerful as a PCRE
  • redirect the output to /dev/null to avoid tty based effects
  • add -H -n 4 to greppin if you want best performance. -H is PCRE compatible with only very few exceptions (according to hyperscan docu)
  • setfattr -n user.pax.flags -v "m" /path/to/binary if you run on grsec systems and require rwx JIT mappings

Then just go ahead and check the timings. Even when not using hyperscan, greppin is significantly faster than rg when using PCRE2 expressions (PCRE2 vs. PCRE2) and still faster when comparing the fastest expressions (-e vs. hyperscan).

More Repositories

1

sshttp

SSH/HTTP(S) multiplexer. Run a webserver and a sshd on the same port w/o changes.
C++
861
star
2

opmsg

opmsg message encryption
C++
744
star
3

fraud-bridge

ICMP and DNS tunneling via IPv4 and IPv6
C++
189
star
4

crash

crypted admin shell: SSH-like strong crypto remote admin shell for Linux, BSD, Android, Solaris and OSX
C++
174
star
5

lophttpd

lots of performance (or lots of porn, if you prefer) httpd: Easy, chrooted, fast and simple to use HTTP server for static content. Runs on Linux, BSD, Android and OSX/Darwin. It's free but if you like it, consider donating to the EFF: https://supporters.eff.org/donate
C++
135
star
6

psc

E2E encryption for multi-hop tty sessions or portshells + TCP/UDP port forward
C++
109
star
7

troubleshooter

setroubleshootd xSports
C
93
star
8

fernmelder

async mass DNS resolver
C++
82
star
9

drops

opmsg p2p transport network
C++
70
star
10

plasmapulsar

PLASMA PULSAR
C
69
star
11

injectso

C
52
star
12

libusipp

unix socket interface for C++ raw IP/IP6/UDP/TCP, Layer2 etc. framework
C++
39
star
13

harddns

RFC8484 and DoH/JSON resolver
C++
38
star
14

pam_schroedinger

Uncertainty for brute forcers during login.
C++
25
star
15

odd

optimized dd
C
20
star
16

qdns

QUANTUM-DNS server
C++
18
star
17

call-graphs

17
star
18

u2f

U2F toolset
C++
17
star
19

QI

Quantum Insert/FoxAcid case study
C++
15
star
20

sslps

OpenSSL Privilege Separation
C
15
star
21

greenday

DGC forensics
Perl
13
star
22

devpops

Companion Worm research
Go
13
star
23

tensor-pwn

AI fun
Python
11
star
24

loaded

IPv4 and IPv6 load balancer
C++
9
star
25

hoschi

hoschi BTC mapper
C++
9
star
26

contrib

Various HTTP, SSH, SSL, localhost etc. related contributions of me, which are too small to have an own repo each
Python
9
star
27

dropsview

Fancy ncurses frontend for opmsg + drops
C++
8
star
28

7350topless

C
8
star
29

aio

scalable Linux implementation for IEEE 1003.1-2004 AIO
C
8
star
30

tinkershell

Perl
7
star
31

number

C++
7
star
32

opmsg-qr

opmsg QR convenience
C
6
star
33

massbind

mass adding ip6 addresses
C
5
star
34

plugmein

pcap USB keyboard sniffer
C++
4
star
35

papyrus

C
3
star
36

test

just ignore
1
star
37

maps

some geojson maps
1
star
38

cloud-clock

C++
1
star
39

polyglots

some polyglots
Shell
1
star
40

bananajoe

Banana Boomsh
C
1
star