• Stars
    star
    121
  • Rank 292,268 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hydra: a multithreaded site-crawling link checker in Python standard library

Hydra: multithreaded site-crawling link checker in Python

Tests status badge

A Python program that crawls slithers ๐Ÿ a website for links and prints a YAML report of broken links.

Requires

Python 3.6 or higher.

There are no external dependencies, Neo.

Usage

$ python hydra.py -h
usage: hydra.py [-h] [--config CONFIG] URL

Positional arguments:

  • URL: The URL of the website to crawl. Ensure URL is absolute including schema, e.g. https://example.com.

Optional arguments:

  • -h, --help: Show help message and exit
  • --config CONFIG, -c CONFIG: Path to a configuration file

A broken links report will be output to stdout, so you may like to redirect this to a file.

The report will be YAML formatted. To save the output to a file, run:

python hydra.py [URL] > [PATH/TO/FILE.yaml]

You can add the current date to the filename using a command substitution, such as:

python hydra.py [URL] > /path/to/$(date '+%Y_%m_%d')_report.yaml

To see how long Hydra takes to check your site, add time:

time python hydra.py [URL]

GitHub Action

You can easily incorporate Hydra as part of an automated process using the link-snitch action.

Configuration

Hydra can accept an optional JSON configuration file for specific parameters, for example:

{
    "OK": [
        200,
        999,
        403
    ],
    "attrs": [
        "href"
    ],
    "exclude_scheme_prefixes": [
        "tel"
    ],
    "tags": [
        "a",
        "img"
    ],
    "threads": 25,
    "timeout": 30,
    "graceful_exit": "True"
}

To use a configuration file, supply the filename:

python hydra.py https://example.com --config ./hydra-config.json

Possible settings:

  • OK - HTTP response codes to consider as a successful link check. Defaults to [200, 999].
  • attrs - Attributes of the HTML tags to check for links. Defaults to ["href", "src"].
  • exclude_scheme_prefixes - HTTP scheme prefixes to exclude from checking. Defaults to ["tel:", "javascript:"].
  • tags - HTML tags to check for links. Defaults to ["a", "link", "img", "script"].
  • threads - Maximum workers to run. Defaults to 50.
  • timeout - Maximum seconds to wait for HTTP response. Defaults to 60.
  • graceful_exit - If set to True, and there are broken links present return exit code 0 else return exit code 1.

Test

Run:

python -m unittest tests/test.py

More Repositories

1

hugo-theme-introduction

Minimal, single page, smooth-scrolling theme for Hugo static site generator.
HTML
667
star
2

hugo-theme-sam

A Simple and Minimalist theme for Hugo with a focus on typography and content.
CSS
417
star
3

dotfiles

Dotfiles and automagic set-up scripts for Linux flavours
Shell
248
star
4

kabukicho-vscode

Neon vaporwave dark theme for VS Code. Now with dreamy nostalgia ๐ŸŒ† and hints of hazy liquid synth. ๐ŸŽง
CSS
94
star
5

django-security-check

Helps you continuously monitor and fix common security vulnerabilities in your Django application.
Shell
88
star
6

neofeed-theme

A personal feed for Neocities, GitHub Pages, or anywhere else, built with Hugo. #IndieWeb friendly and all yours. It's better than Twitter.
HTML
86
star
7

simple-subscribe

Collect emails with a subscription box you can add to any page and build your own independent subscriber base.
Go
82
star
8

i3-linux-config-tokyo-rice

My config files for i3-gaps and Linux, first rice.
74
star
9

hugo-remote

GitHub Action to build and deploy a Hugo site to a remote repository. Deploy from a private repo to a public one!
Shell
66
star
10

link-snitch

:octocat: GitHub Action to scan your site for broken links so you can fix them ๐Ÿ”—
Shell
58
star
11

victoriadrake.github.io

๐ŸŒฑ Victoria's autonomous self-improving blockchain-fortified AI static website
HTML
32
star
12

jekyll-cd

:octocat: GitHub Action to build and deploy a Jekyll site to GitHub Pages ๐Ÿงช
Shell
24
star
13

start

A simple and pleasing new tab startpage or homepage.
JavaScript
23
star
14

hugo-theme-quint

Quint: Essence of Minimalism. A theme for Hugo static site generator.
HTML
22
star
15

heartbreak

Unlikes your Twitter Likes. ๐Ÿ’”
Go
21
star
16

victoriadrake

๐Ÿ‘‹๐ŸŒŽ
Go
20
star
17

chatgptmax

Python module to send large input to ChatGPT using preprocessing and chunking.
Python
16
star
18

git-rundown

๐Ÿ—ƒ Check the status of multiple git repositories in a folder
Shell
14
star
19

fancy-unicode

Turn boring plain text into pretty unicode characters.
JavaScript
13
star
20

open-mscs

Based on OMSCS. ๐ŸŽ“ Collaborative open source notes for graduate computer science courses.
13
star
21

hugo-latest-cd

:octocat: GitHub Action to build and deploy ๐Ÿš€ a Hugo site to GitHub Pages using latest extended Hugo
Shell
13
star
22

react-in-django

Basic scaffold for a Django Rest Framework + React app.
Python
12
star
23

github-guestbook

A retro 90s website guestbook powered by GitHub Actions
HTML
11
star
24

gitdo

๐Ÿฆพ Tools for doing things with Git repositories
Shell
11
star
25

acme-gallery

Automagical gallery page generator.
JavaScript
7
star
26

django-starter

Django best practices and developer tools in a starter repository for your next project. Clone and start building.
HTML
7
star
27

30-days-of-code

My solutions for HackerRank's 30 Days of Code challenges using Python 3.
Python
7
star
28

hugo-theme-memex

A personal memex theme.
HTML
6
star
29

utc

A UTC clock.
CSS
6
star
30

rss-mailer

A Lambda function for turning your RSS feed items into emails.
Go
6
star
31

xmas

A countdown clock. ๐ŸŽ„
4
star
32

simon

A futuristic, post-apocalypse, AI... Simon game.
JavaScript
4
star
33

standardnotes-to-markdown

Takes a Standard Notes JSON export and creates Markdown files
Python
4
star
34

author-afk

Post tweets with RSS links using AWS Lambda when you're afk.
Go
3
star
35

minimalist-calc

A simple responsive JavaScript calculator. Equivalent capabilities to a very smart monkey doing Grade 6 homework.
HTML
2
star
36

got-issues

A standard-library Python utility that uses the GitHub API to collect issues data from any repository.
Python
2
star
37

.github

๐Ÿ“‚ Default community health files for my account
2
star
38

chicago-api

Capitalize your title in something pretty close to Chicago Manual of Style title case.
TypeScript
2
star
39

vsc-starter-snippets

Visual Studio Code snippets to help you start projects faster.
2
star
40

quint-demo

Demo site for the Hugo theme Quint.
JavaScript
2
star
41

tokyo-life

A demo gallery. Automagically generated with `acme-gallery`.
HTML
2
star
42

pomo-clock

A pomodoro timer.
JavaScript
1
star
43

cats-vs-unicorns

A 90s-inspired tic-tac-toe variant.
JavaScript
1
star