• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 7 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Grep Web pages with extra features like JS deobfuscation and OCR

WebGrep Tweet

Grep Web pages and their resources.

PyPi Platform Read The Docs Known Vulnerabilities License

This self-contained tool relies on the well-known grep tool for grepping Web pages. It binds nearly every option of the original tool and also provides additional features like deobfuscating Javascript or appyling OCR on images before grepping downloaded resources.

$ pip install webgrep-tool

⏩ Quick Start

  1. Help
$ webgrep --help
usage: webgrep [OPTION]... PATTERN [URL]...

Search for PATTERN in each input URL and its related resources
(images, scripts and style sheets).
By default,
- resources are NOT downloaded
- response HTTP headers are NOT included in grepping ; use '--include-headers'
- PATTERN is a basic regular expression (BRE) ; use '-E' for extended (ERE)
Important note: webgrep does not handle recursion (in other words, it does not
               spider additional web pages).
Examples:
 webgrep example http://www.example.com     # will only grep on HTML code
 webgrep -r example http://www.example.com  # will only grep on LOCAL images, ...
 webgrep -R example http://www.example.com  # will only grep on ALL images, ...

Regexp selection and interpretation:
 -e REGEXP, --regexp REGEXP
                       use PATTERN for matching
 -f FILE, --file FILE  obtain PATTERN from FILE
 -E, --extended-regexp
                       PATTERN is an extended regular expression (ERE)
 -F, --fixed-strings   PATTERN is a set of newline-separated fixed strings
 -G, --basic-regexp    PATTERN is a basic regular expression (BRE)
 -P, --perl-regexp     PATTERN is a Perl regular expression
 -i, --ignore-case     ignore case distinctions
 -w, --word-regexp     force PATTERN to match only whole words
 -x, --line-regexp     force PATTERN to match only whole lines
 -z, --null-data       a data line ends in 0 byte, not newline

Miscellaneous:
 -s, --no-messages     suppress error messages
 -v, --invert-match    select non-matching lines
 -V, --version         print version information and exit
 --help                display this help and exit
 --verbose             verbose mode
 --keep-files          keep temporary files in the temporary directory
 --temp-dir TMP        define the temporary directory (default: /tmp/webgrep)

Output control:
 -m NUM, --max-count NUM
                       stop after NUM matches
 -b, --byte-offset     print the byte offset with output lines
 -n, --line-number     print line number with output lines
 --line-buffered       flush output on every line
 -H, --with-filename   print the file name for each match
 -h, --no-filename     suppress the file name prefix on output
 --label LABEL         use LABEL as the standard input filename prefix
 -o, --only-matching   show only the part of a line matching PATTERN
 -q, --quiet, --silent
                       suppress all normal output
 --binary-files TYPE   assume that binary files are TYPE;
                       TYPE is 'binary', 'text', or 'without-match'
 -a, --text            equivalent to --binary-files=text
 -I                    equivalent to --binary-files=without-match
 -L, --files-without-match
                       print only names of FILEs containing no match
 -l, --files-with-match
                       print only names of FILEs containing matches
 -c, --count           print only a count of matching lines per FILE
 -T, --initial-tab     make tabs line up (if needed)
 -Z, --null            print 0 byte after FILE name

Context control:
 -B NUM, --before-context NUM
                       print NUM lines of leading context
 -A NUM, --after-context NUM
                       print NUM lines of trailing context
 -C NUM, --context NUM
                       print NUM lines of output context

Web options:
 -r, --local-resources
                       also grep local resources (same-origin)
 -R, --all-resources   also grep all resources (even non-same-origin)
 --include-headers     also grep HTTP headers
 --cookie COOKIE       use a session cookie in the HTTP headers
 --referer REFERER     provide the referer in the HTTP headers

Proxy settings (by default, system proxy settings are used):
 -d, --disable-proxy   manually disable proxy
 --http-proxy HTTP     manually set the HTTP proxy
 --https-proxy HTTPS   manually set the HTTPS proxy

Please report bugs on GitHub: https://github.com/dhondta/webgrep

  1. Example
$ ./webgrep -R Welcome https://github.com
      Welcome home, <br>developers

πŸ“Œ Resource Handlers

Definitions:

  • Resource (what is being processed): Web page, images, Javascript, CSS
  • Handler (how a resource is processed): CSS unminifying, OCR, deobfuscation, EXIF data retrieval, ...

The handlers are defined in the # --...-- HANDLERS SECTION --...-- of the code. Currently available handlers :

  1. Images
  • EXIF: using exiftool
  • Steganography: using steghide (with a blank password)
  • Strings: using strings
  • OCR: using tesseract
  1. Scripts
  • Javascript beautifying and deobfuscation: using jsbeautifier
  1. Styles
  • Unminifying: using regular expressions

Note: images found in the CSS files are also processed.

πŸ‘ Supporters

Stargazers repo roster for @dhondta/webgrep

Forkers repo roster for @dhondta/webgrep

Back to top

More Repositories

1

dronesploit

Drone pentesting framework console
Python
1,366
star
2

awesome-executable-packing

A curated list of awesome resources related to executable packing
495
star
3

python-codext

Python codecs extension featuring CLI tools for encoding/decoding anything
Python
265
star
4

python-sploitkit

Devkit for building Metasploit-like consoles
Python
229
star
5

rpl-attacks

RPL attacks framework for simulating WSN with a malicious mote based on Contiki
Python
72
star
6

tex-course-index-template

A template for writing a condensed course index leveraging LaTeX indexing
Python
49
star
7

python-tinyscript

Devkit for quickly building CLI tools with Python
Python
47
star
8

zotero-cli

Tinyscript tool for sorting and exporting Zotero references based on pyzotero
Python
40
star
9

stegano-tools

Collection of steganography tools for images and text
34
star
10

AppmemDumper

Forensics triage tool relying on Volatility and Foremost
Python
24
star
11

mkdocs-revealjs-template

Template of MkDocs + Reveal.js static documentation website
CSS
19
star
12

bots-scheduler

Cron-like system based on Nextdoor Scheduler, PyBots and Tinyscript
Python
17
star
13

tex-book-template

A template for writing a nice book with LaTeX
TeX
15
star
14

peid

Python implementation of the Packed Executable iDentifier (PEiD)
Python
15
star
15

python-pybots

πŸ”§ Devkit for quickly creating client bots for remote communications
Python
13
star
16

recursive-compression

Tinyscript tool for recursively (de)compressing nested archives using multiple algorithms (bzip2, rar, lzma, ...)
Python
12
star
17

malicious-macro-tester

CLI tool for testing Office documents with macros using MaliciousMacroBot
Python
9
star
18

tex-master-thesis-template

A template for writing a nice master thesis dissertation with LaTeX
TeX
8
star
19

python-asciistuff

🎨 Library for producing ASCII arts
Python
7
star
20

docker-packing-box

Docker image gathering packers and tools for making datasets of packed executables and training machine learning models for packing detection
Python
7
star
21

bintropy

Analysis tool for estimating the likelihood that a binary contains compressed or encrypted bytes
Python
6
star
22

pentest-for-beginners

PenTesting course made with Mkdocs/Reveal.js
HTML
5
star
23

searchpass

Tinyscript tool for searching for default passwords on various open source databases based on pybots
Python
4
star
24

scapl-search

SCAPL search engine component.
Python
2
star
25

scapl-automation

SCAPL automation system component.
Python
2
star
26

scapl-install

SCAPL application installation files
Shell
1
star
27

scapl-backend

SCAPL backend component.
1
star
28

tex-poster-template

A template for creating a nice scientific poster with LaTeX
TeX
1
star
29

tex-cheat-sheet-template

A template for creating a nice cheat sheet with LaTeX
TeX
1
star
30

scapl-frontend

SCAPL frontend component.
JavaScript
1
star