• Stars
    star
    8,147
  • Rank 4,555 (Top 0.09 %)
  • Language
    Rust
  • License
    Other
  • Created over 5 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.

github repo Crates.io fearless concurrency

For more detail, see this introductory blogpost: https://phiresky.github.io/blog/2019/rga--ripgrep-for-zip-targz-docx-odt-epub-jpg/

rga will recursively descend into archives and match text in every file type it knows.

Here is an example directory with different file types:

demo/
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
└── somearchive.zip
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub

rga output

Integration with fzf

rga-fzf

You can use rga interactively via fzf. Add the following to your ~/.{bash,zsh}rc:

rga-fzf() {
	RG_PREFIX="rga --files-with-matches"
	local file
	file="$(
		FZF_DEFAULT_COMMAND="$RG_PREFIX '$1'" \
			fzf --sort --preview="[[ ! -z {} ]] && rga --pretty --context 5 {q} {}" \
				--phony -q "$1" \
				--bind "change:reload:$RG_PREFIX {q}" \
				--preview-window="70%:wrap"
	)" &&
	echo "opening $file" &&
	xdg-open "$file"
}

And for your ~/.config/fish/config.fish:

function rga-fzf
    set RG_PREFIX 'rga --files-with-matches'
    if test (count $argv) -gt 1
        set RG_PREFIX "$RG_PREFIX $argv[1..-2]"
    end
    set -l file $file
    set file (
        FZF_DEFAULT_COMMAND="$RG_PREFIX '$argv[-1]'" \
        fzf --sort \
            --preview='test ! -z {} && \
                rga --pretty --context 5 {q} {}' \
            --phony -q "$argv[-1]" \
            --bind "change:reload:$RG_PREFIX {q}" \
            --preview-window='50%:wrap'
    ) && \
    echo "opening $file" && \
    open "$file"
end

INSTALLATION

Linux x64, macOS and Windows binaries are available in GitHub Releases.

Linux

Arch Linux

pacman -S ripgrep-all.

Nix

nix-env -iA nixpkgs.ripgrep-all

Debian-based

download the rga binary and get the dependencies like this:

apt install ripgrep pandoc poppler-utils ffmpeg

If ripgrep is not included in your package sources, get it from here.

rga will search for all binaries it calls in $PATH and the directory itself is in.

Windows

Install ripgrep-all via Chocolatey:

choco install ripgrep-all

Note that installing via chocolatey or scoop is the only supported download method. If you download the binary from releases manually, you will not get the dependencies (for example pdftotext from poppler).

If you get an error like VCRUNTIME140.DLL could not be found, you need to install vc_redist.x64.exe.

Homebrew/Linuxbrew

rga can be installed with Homebrew:

brew install rga

To install the dependencies that are each not strictly necessary but very useful:

brew install pandoc poppler tesseract ffmpeg

Compile from source

rga should compile with stable Rust (v1.36.0+, check with rustc --version). To build it, run the following (or the equivalent in your OS):

   ~$ apt install build-essential pandoc poppler-utils ffmpeg ripgrep cargo
   ~$ cargo install --locked ripgrep_all
   ~$ rga --version    # this should work now

Available Adapters

rga --rga-list-adapters

Adapters:

  • pandoc Uses pandoc to convert binary/unreadable text documents to plain markdown-like text Runs: pandoc --from= --to=plain --wrap=none --markdown-headings=atx
    Extensions: .epub, .odt, .docx, .fb2, .ipynb

  • poppler Uses pdftotext (from poppler-utils) to extract plain text from PDF files Runs: pdftotext - -
    Extensions: .pdf
    Mime Types: application/pdf

  • postprocpagebreaks Adds the page number to each line for an input file that specifies page breaks as ascii page break character. Mainly to be used internally by the poppler adapter.
    Extensions: .asciipagebreaks

  • ffmpeg Uses ffmpeg to extract video metadata/chapters, subtitles, lyrics, and other metadata
    Extensions: .mkv, .mp4, .avi, .mp3, .ogg, .flac

  • zip Reads a zip file as a stream and recurses down into its contents
    Extensions: .zip, .jar
    Mime Types: application/zip

  • decompress Reads compressed file as a stream and runs a different extractor on the contents.
    Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
    Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd

  • tar Reads a tar file as a stream and recurses down into its contents
    Extensions: .tar

  • sqlite Uses sqlite bindings to convert sqlite databases into a simple plain text format
    Extensions: .db, .db3, .sqlite, .sqlite3
    Mime Types: application/x-sqlite3

The following adapters are disabled by default, and can be enabled using '--rga-adapters=+foo,bar':

USAGE:

rga [RGA OPTIONS] [RG OPTIONS] PATTERN [PATH ...]

FLAGS:

--rga-accurate

Use more accurate but slower matching by mime type

By default, rga will match files using file extensions. Some programs, such as sqlite3, don't care about the file extension at all, so users sometimes use any or no extension at all. With this flag, rga will try to detect the mime type of input files using the magic bytes (similar to the `file` utility), and use that to choose the adapter. Detection is only done on the first 8KiB of the file, since we can't always seek on the input (in archives).

--rga-no-cache

Disable caching of results

By default, rga caches the extracted text, if it is small enough, to a database in ${XDG*CACHE_DIR-~/.cache}/ripgrep-all on Linux, *~/Library/Caches/ripgrep-all_ on macOS, or C:\Users\username\AppData\Local\ripgrep-all on Windows. This way, repeated searches on the same set of files will be much faster. If you pass this flag, all caching will be disabled.

-h, --help

Prints help information

--rga-list-adapters

List all known adapters

--rga-print-config-schema

Print the JSON Schema of the configuration file

--rg-help

Show help for ripgrep itself

--rg-version

Show version of ripgrep itself

-V, --version

Prints version information

OPTIONS:

--rga-adapters=<adapters>...

Change which adapters to use and in which priority order (descending)

"foo,bar" means use only adapters foo and bar. "-bar,baz" means use all default adapters except for bar and baz. "+bar,baz" means use all default adapters and also bar and baz.

--rga-cache-compression-level=<compression-level>

ZSTD compression level to apply to adapter outputs before storing in cache db

Ranges from 1 - 22 [default: 12]

--rga-config-file=<config-file-path>

--rga-max-archive-recursion=<max-archive-recursion>

Maximum nestedness of archives to recurse into [default: 4]

--rga-cache-max-blob-len=<max-blob-len>

Max compressed size to cache

Longest byte length (after compression) to store in cache. Longer adapter outputs will not be cached and recomputed every time.

Allowed suffixes on command line: k M G [default: 2000000]

--rga-cache-path=<path>

Path to store cache db [default: /home/phire/.cache/ripgrep-all]

-h shows a concise overview, --help shows more detail and advanced options.

All other options not shown here are passed directly to rg, especially [PATTERN] and [PATH ...]

Development

To enable debug logging:

export RUST_LOG=debug
export RUST_BACKTRACE=1

Also remember to disable caching with --rga-no-cache or clear the cache (~/Library/Caches/rga on macOS, ~/.cache/rga on other Unixes, or C:\Users\username\AppData\Local\rga on Windows) to debug the adapters.

Nix and Direnv

You can use the provided flake.nix to setup all build- and run-time dependencies:

  1. Enable Flakes in your Nix configuration.
  2. Add direnv to your profile: nix profile install nixpkgs#direnv
  3. cd into the directory where you have cloned this directory.
  4. Allow use of .envrc: direnv allow
  5. After the dependencies have been installed, your shell will now have all of the necessary development dependencies.

More Repositories

1

sql.js-httpvfs

Hosting read-only SQLite databases on static file hosters like Github Pages
TypeScript
3,469
star
2

sqlite-zstd

Transparent dictionary-based row-level compression for SQLite
Rust
1,447
star
3

blog

Source code of my personal blog
TypeScript
339
star
4

world-development-indicators-sqlite

Python
152
star
5

procedural-cities

Information about procedural city generation
TeX
128
star
6

nmap-log-parse

Logs which devices are in your local network and draws graphs
TypeScript
127
star
7

pandoc-url2cite

Effortlessly and transparently add correctly styled citations to your markdown paper given only a URL
TypeScript
122
star
8

youtube-sponsorship-stats

TypeScript
121
star
9

timetrackrs

An automated time tracker (WIP)
Rust
113
star
10

tv-show-ratings

Compare the episode ratings of TV shows
TypeScript
75
star
11

neural-network-demo

Demonstration and visualization of feed-forward neural networks running in the browser
TypeScript
60
star
12

tuxguitar

unofficial mirror from sourceforge svn
Java
57
star
13

backchannel-prediction

Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Python
54
star
14

tantivy-wasm

TypeScript
42
star
15

fbstats

generate facebook messaging statistics
TypeScript
38
star
16

levenshtein-demo

TypeScript
35
star
17

typed-socket.io

A library for fully typed client-server communication with socket.io and TypeScript.
TypeScript
30
star
18

encrypted-gist

Storing files in github gists, with client side transparent encryption and authentication
TypeScript
28
star
19

webrtc-remote-touch-pen-input

JavaScript
24
star
20

ts-histdbimport

Imports a `.zsh_history` file into a https://github.com/larkery/zsh-histdb sqlite database.
JavaScript
24
star
21

convolution-demo

Visualization of convolution on pairs of simple functions
TypeScript
18
star
22

thought-forge-ai

Generate 30-60 second "deep thought" TikTok-style video including a monologue, moving video scenes, music, and subtitles.
TypeScript
16
star
23

ebv-theta-to-mqtt

Perl
15
star
24

emojidome

Interactive viewer of the results of the Emojidome XKCD
TypeScript
14
star
25

dupegone

small fast duplicate file finder in c++
C++
11
star
26

redis-remotify

A tiny TypeScript library for fully typed remote calls via Redis.
TypeScript
9
star
27

mima

Mima Compiler and Interpreter for the browser
JavaScript
9
star
28

Gelddruckmaschine

Telegram bot that finds arbitrage opportunities between multiple crypto trading sites to print money mostly risk-free
JavaScript
8
star
29

warc-sqlite

POC of converting a set of WARC web archive files to a SQLite database and querying it
Python
7
star
30

phiresky.github.io

Lazy Github homepage generator
TypeScript
7
star
31

deep-intellisense

IntelliSense based on deep learning using char-rnn
TypeScript
6
star
32

endoh1-ts

Deobfuscating the ASCII fluid simulater and converting it to TypeScript
JavaScript
6
star
33

lemmy-federation-state

Quick visualization of the lemmy federation state
TypeScript
6
star
34

youtube-watch-history-parse

TypeScript
5
star
35

rpi-autousbupload

Automatically uploads photos from usb devices to an ftp server. Optimized for robustness on an Raspberry Pi.
Python
5
star
36

dfa2regex

Converts Deterministic finite automata to regular expressions.
JavaScript
5
star
37

minecraft-stats

Parses Vanilla Minecraft Server Statistics and displays graphs
JavaScript
4
star
38

fix-messy-movie-folder

Try to identify movie files in messy folders and sort them into a predefined structure
Java
4
star
39

rust-brotli-wasm

Two experiments of the brotli encoder / decoder compiled from Rust to WebAssembly.
Rust
4
star
40

RobinHood-TheLegendOfSherwood-Resolution-Patcher

Patch the resolution of Robin Hood: The Legend of Sherwood to support any resolution (e.g. 1920x1080)
C++
4
star
41

socket.io-distributor

Simple load leveling for socket.io
TypeScript
3
star
42

portfolio-bot

Telegram Bot that allows you to get information on your investments (stocks, ETFs, etc)
TypeScript
3
star
43

coronavirus-reproduction-analysis

Jupyter Notebook
3
star
44

qalc-react

Unit Calculator
TypeScript
3
star
45

gaussian-mixtures-demo

TypeScript
2
star
46

pythoven

Python
2
star
47

joint-multilingual-speech-recognition-and-language-id

TeX
2
star
48

tree-magic-cli

Rust
2
star
49

how-long-am-i-working

use google location history to graph how much time you've spent at a location (e.g. at work)
TypeScript
2
star
50

pingplot

plot a ping log
R
2
star
51

csv-cooccurrence-graph

Rust
2
star
52

RaspberrySmartScaleReceiver

Program to receive the body weight and body composition analysis data from a Soehnle 63760 BB smart scale using a 433 Mhz receiver on the Raspberry Pi.
C++
2
star
53

phiresky

1
star
54

rga-windows-test

Rust
1
star
55

bachelor-thesis

TeX
1
star
56

bayesian-aggregation-for-swarm-reinforcement-learning

Python
1
star
57

score-voting-tool

Rust
1
star
58

masters-thesis

HTML
1
star
59

algo2-summary

compact summary of stuff from the algorithms 2 lecture. Zusammenfassung der Algorithmen II Vorlesung am KIT
TeX
1
star
60

ocr-pdf-via-document-ai

OCR a set of images via the Google Cloud API
JavaScript
1
star
61

plangraph-impl

Implementation of some algorithms for planar graphs
JavaScript
1
star
62

guitar-tabs

Static hosted version of the guitar tabs portion of my old (<2010) website
CSS
1
star
63

plangraph

Mitschrieb zur Vorlesung planare Graphen KIT SS2015
Python
1
star
64

kogsys-demos

overview page for all the lecture demonstrations
HTML
1
star
65

ts-boilerplates

just some simple boilerplates for me to use
TypeScript
1
star
66

nushell-history-skim

Rust
1
star
67

prosem-proto

Simple procedural town generation (for Proseminar)
TypeScript
1
star
68

fourier-series-demo

TypeScript
1
star