• Stars
    star
    201
  • Rank 194,491 (Top 4 %)
  • Language
    C++
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Recursively search directories for a regex pattern

Highlights

  • Search recursively for a regex pattern using Intel Hyperscan.
  • When a git repository is detected, the repository index is searched using libgit2.
  • Similar to grep, ripgrep, ugrep, The Silver Searcher etc.
  • C++17, Multi-threading, SIMD.
  • USAGE GUIDE
  • Implementation notes here.
  • Not cross-platform. Tested in Linux.

Performance

The following tests compare the performance of hypergrep against:

System Details

Type Value
Processor 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Instruction Set Extensions Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2, Intel® AVX-512
Installed RAM 32.0 GB (31.9 GB usable)
SSD ADATA SX8200PNP
OS Ubuntu 20.04 LTS
C++ Compiler g++ (Ubuntu 11.1.0-1ubuntu1-20.04) 11.1.0

Vcpkg Installed Libraries

vcpkg commit: 662dbb5

Library Version
argparse 2.9
concurrentqueue 1.0.3
fmt 10.0.0
hyperscan 5.4.2
libgit2 1.6.4

Single Large File Search: OpenSubtitles.raw.en.txt

The following searches are performed on a single large file cached in memory (~13GB, OpenSubtitles.raw.en.gz).

Regex Line Count ag ugrep ripgrep hypergrep
Count number of times Holmes did something
hgrep -c 'Holmes did \w'
27 n/a 1.820 1.022 0.696
Literal with Regex Suffix
hgrep -nw 'Sherlock [A-Z]\w+' en.txt
7882 n/a 1.812 1.509 0.803
Simple Literal
hgrep -nw 'Sherlock Holmes' en.txt
7653 15.764 1.888 1.524 0.658
Simple Literal (case insensitive)
hgrep -inw 'Sherlock Holmes' en.txt
7871 15.599 6.945 2.162 0.650
Alternation of Literals
hgrep -n 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt
10078 n/a 6.886 1.836 0.689
Alternation of Literals (case insensitive)
hgrep -in 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt
10333 n/a 7.029 3.940 0.770
Words surrounding a literal string
hgrep -n '\w+[\x20]+Holmes[\x20]+\w+' en.txt
5020 n/a 6m 11s 1.523 0.638

Git Repository Search: torvalds/linux

The following searches are performed on the entire Linux kernel source tree (after running make defconfig && make -j8). The commit used is f1fcb.

Regex Line Count ag ugrep ripgrep hypergrep
Simple Literal
hgrep -nw 'PM_RESUME'
9 2.807 0.316 0.147 0.140
Simple Literal (case insensitive)
hgrep -niw 'PM_RESUME'
39 2.904 0.435 0.149 0.141
Regex with Literal Suffix
hgrep -nw '[A-Z]+_SUSPEND'
536 3.080 1.452 0.148 0.143
Alternation of four literals
hgrep -nw '(ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)'
16 3.085 0.410 0.153 0.146
Unicode Greek
hgrep -n '\p{Greek}'
111 3.762 0.484 0.345 0.146

Git Repository Search: apple/swift

The following searches are performed on the entire Apple Swift source tree. The commit used is 3865b.

Regex Line Count ag ugrep ripgrep hypergrep
Function/Struct/Enum declaration followed by a valid identifier and opening parenthesis
hgrep -n '(func|struct|enum)\s+[A-Za-z_][A-Za-z0-9_]*\s*\('
59026 1.148 0.954 0.154 0.090
Words starting with alphabetic characters followed by at least 2 digits
hgrep -nw '[A-Za-z]+\d{2,}'
127858 1.169 1.238 0.156 0.095
Workd starting with Uppercase letter, followed by alpha-numeric chars and/or underscores
hgrep -nw '[A-Z][a-zA-Z0-9_]*'
2012372 3.131 2.598 0.550 0.482
Guard let statement followed by valid identifier
hgrep -n 'guard\s+let\s+[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*\w+'
839 0.828 0.174 0.054 0.047

Directory Search: /usr

The following searches are performed on the /usr directory.

Regex Line Count ag ugrep ripgrep hypergrep
Any HTTPS or FTP URL
hgrep "(https?|ftp)://[^\s/$.?#].[^\s]*"
13682 4.597 2.894 0.305 0.171
Any IPv4 IP address
hgrep -w "(?:\d{1,3}\.){3}\d{1,3}"
12643 4.727 2.340 0.324 0.166
Any E-mail address
hgrep -w "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"
47509 5.477 37.209 0.494 0.220
Any valid date MM/DD/YYYY
hgrep "(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/(19|20)\d{2}"
116 4.239 1.827 0.251 0.163
Count the number of HEX values
hgrep -cw "(?:0x)?[0-9A-Fa-f]+"
68042 5.765 28.691 1.439 0.611
Search any C/C++ for a literal
hgrep --filter "\.(c|cpp|h|hpp)$" test
7355 n/a 0.505 0.118 0.079

Build

Install Dependencies with vcpkg

git clone https://github.com/microsoft/vcpkg
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg install concurrentqueue fmt argparse libgit2 hyperscan

Build hypergrep using cmake and vcpkg

Clone the repository

git clone https://github.com/p-ranav/hypergrep
cd hypergrep

If cmake is older than 3.19

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=<path_to_vcpkg>/scripts/buildsystems/vcpkg.cmake ..
make

If cmake is newer than 3.19

Use the release preset:

export VCPKG_ROOT=<path_to_vcpkg>
cmake -B build -S . --preset release
cmake --build build

Binary Portability

To build the binary for x86_64 portability, invoke cmake with -DBUILD_PORTABLE=on option. This will use -march=x86-64 -mtune=generic and -static-libgcc -static-libstdc++, and link the C++ standard library and GCC runtime statically into the binary, reducing dependencies on the target system.

More Repositories

1

awesome-hpp

A curated list of awesome header-only C++ libraries
3,468
star
2

indicators

Activity Indicators for Modern C++
C++
3,004
star
3

argparse

Argument Parser for Modern C++
C++
2,655
star
4

tabulate

Table Maker for Modern C++
C++
1,926
star
5

pprint

Pretty Printer for Modern C++
C++
911
star
6

csv2

Fast CSV parser and writer for Modern C++
C++
552
star
7

alpaca

Serialization library written in C++17 - Pack C++ structs into a compact byte-array without any macros or boilerplate code
C++
474
star
8

structopt

Parse command line arguments by defining a struct
C++
455
star
9

fccf

fccf: A command-line tool that quickly searches through C/C++ source code in a directory based on a search string and prints relevant code snippets that match the query.
C++
359
star
10

glob

Glob for C++17
C++
246
star
11

csv

[DEPRECATED] See https://github.com/p-ranav/csv2
C++
234
star
12

criterion

Microbenchmarking for Modern C++
C++
211
star
13

binary_log

Fast binary logger for C++
C++
207
star
14

saveddit

Bulk Downloader for Reddit
Python
169
star
15

PhotoLab

AI-Powered Photo Editor (Python, PyQt6, PyTorch)
Python
161
star
16

box

box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.
Python
120
star
17

cppgit2

Git for Modern C++ (A libgit2 Wrapper Library)
C++
116
star
18

psched

Priority-based Task Scheduling for Modern C++
C++
84
star
19

repr

repr for Modern C++: Return printable string representation of a value
C++
83
star
20

fswatch

File/Directory Watcher for Modern C++
C++
79
star
21

envy

envy: Deserialize environment variables into type-safe structs
C++
66
star
22

pipeline

Pipelines for Modern C++
C++
57
star
23

iris

Lightweight Component Model and Messaging Framework based on ØMQ
C++
53
star
24

merged_depth

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models
Python
47
star
25

unicode_display_width

Displayed width of UTF-8 strings in Modern C++
C++
44
star
26

task_system

Task System presented in "Better Code: Concurrency - Sean Parent"
C++
39
star
27

cgol

Conway's Game of Life in the Terminal
C++
35
star
28

small_vector

"Small Vector" optimization for Modern C++: store up to a small number of items on the stack
C++
33
star
29

jsonlint

Lightweight command-line tool for validating JSON
C++
33
star
30

result

Result<T, E> for Modern C++
C++
32
star
31

container_traits

Container Traits for Modern C++
C++
28
star
32

lexer

Hackable Lexer with UTF-8 support
C++
21
star
33

lc

Fast multi-threaded line counter in Modern C++ (2-10x faster than `wc -l` for large files)
C++
18
star
34

oystr

oystr recursively searches directories for a substring.
C++
10
star
35

walnut.v1

The Walnut programming language
C++
8
star
36

line-detector

OpenCV-based Hough Transform Line Detection
C++
8
star
37

ttt

Terminal Typing Test
C++
7
star
38

OpenGL-Engine

OpenGL 3D Rendering Engine
C++
7
star
39

wxPython-text-editor

wxPython Text Editor
Python
6
star
40

Vulkan-Earth

Vulkan-based 3D Rendering of Earth
HTML
6
star
41

strcpp.old

String Manipulation API for C++
C++
6
star
42

DiverseDepth

The code and data of DiverseDepth
Python
6
star
43

ImageViewer-Qt6

Minimalist image viewer in Qt6
C++
6
star
44

any_of_trait

Type traits for any_of and any_but
C++
5
star
45

zcm

A Lightweight Component Model using ZeroMQ
C++
4
star
46

StaticAnalysis

GitHub action for C++ static analysis
Python
4
star
47

video_device_discovery

Find all video devices connected to Linux-based embedded platform
C++
3
star
48

krpci

C++ client to kRPC for communication with Kerbal Space Program (KSP)
C++
2
star
49

activity-plotter

Linux Scheduler Thread Activity Plotter
Python
2
star
50

python-zcm

ZeroMQ-based Component Model in Python
Python
2
star
51

emacs_config

Emacs configuration
Emacs Lisp
1
star
52

plexil-analysis

Timing Analysis for the Plan Interchange Language (Plexil)
Python
1
star
53

object-tracker

OpenCV-based Real-time Object Tracking
C++
1
star
54

json.old

JSON Manipulation Library for C++
C++
1
star
55

phd-dissertation

TeX
1
star
56

OpenGL-Engine-II

OpenGL 3D Rendering Engine II - Alternate Architecture
C++
1
star
57

arangit

Python program that can scan a .git folder and reconstruct a git version control property graph in ArangoDB
Python
1
star
58

ros-installer

Script to install ROS Indigo from source
Python
1
star