• Stars
    star
    112
  • Rank 312,240 (Top 7 %)
  • Language
    Rust
  • License
    MIT License
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Very fast link checker for CI.

hyperlink

A command-line tool to find broken links in your static site.

  • Fast. docs.sentry.io produces 1.1 GB of HTML files. hyperlink handles this amount of data in 4 seconds on a MacBook Pro 2018. See Alternatives for a performance comparison.

  • Pay for what you need. By default, hyperlink checks for hard 404s in internal links only. Anything beyond that is opt-in. See Options for a list of features to enable.

  • Maps back errors to source files. If your static site was created from Markdown files, hyperlink can try to find the original broken link by fuzzy-matching the content around it. See the --sources option.

  • Supports traversing file-system paths only, no arbitrary URLs.

    • No support for the <base> tag.

    • No support for external links. It does not know how to speak HTTP.

    • Even if you don't have a static site, you can put hyperlink to work by first downloading the entire website using e.g. suckit. In certain cases this is faster than other tools too.

  • Does not honor robots.txt. A broken link is still broken for users even if not indexed by Google.

  • Does not parse CSS files, as broken links in CSS have not been a practical concern for us. We are concerned about broken link in the page content, not the chrome around it.

  • Only supports UTF-8 encoded HTML files.

Installation and Usage

Download the latest binary and:

# Check a folder of HTML
./hyperlink public/

# Also validate anchors
./hyperlink public/ --check-anchors

# src/ is a folder of Markdown. Show original Markdown file paths in errors
./hyperlink public/ --sources src/

GitHub action

- uses: untitaker/[email protected]
  with:
    args: public/ --sources src/

NPM

npm install -g @untitaker/hyperlink
hyperlink public/ --sources src/

Docker

docker run -v $PWD:/check ghcr.io/untitaker/hyperlink:0.1.29 /check/public/ --sources /check/src/

# specific commit
docker run -v $PWD:/check ghcr.io/untitaker/hyperlink:sha-82ca78c /check/public/ --sources /check/src

See all available tags

From source

cargo install hyperlink  # latest stable release
cargo install --git https://github.com/untitaker/hyperlink  # latest git SHA

Options

When invoked without options, hyperlink only checks for 404s of internal links. However, it can do more.

  • -j/--jobs: How many threads to spawn for parsing HTML. By default hyperlink will attempt to saturate your CPU.

  • --check-anchors: Opt-in, check for validity of anchors on pages. Broken anchors are considered warnings, meaning that hyperlink will exit 2 if there are only broken anchors but no hard 404s.

  • --sources: A folder of markdown files that were the input for the HTML hyperlink has to check. This is used to provide better error messages that point at the actual file to edit. hyperlink does very simple content-based matching to figure out which markdown files may have been involved in the creation of a HTML file.

    Why not just crawl and validate links in Markdown at this point? Answer:

    • There are countless of proprietary extensions to markdown out there for creating intra-page links that are generally not supported by link checking tools.

    • The structure of your markdown content does not necessarily match the structure of your HTML (i.e. what the user actually sees). With this setup, hyperlink does not have to assume anything about your build pipeline.

  • --github-actions: Emit GitHub actions errors, i.e. add error messages in-line to PR diffs. This is only useful with --sources set.

    If you are using hyperlink through the GitHub action this option is already set. It is only useful if you are downloading/building and running hyperlink yourself in CI.

Exit codes

  • exit 1: There have been errors (hard 404s)
  • exit 2: There have been only warnings (broken anchors)

Alternatives

(roughly ranked by performance, determined by some unserious benchmark. this section contains partially dated measurements and is not continuously updated with regards to either performance or featureset)

None of the listed alternatives have an equivalent to hyperlink's --sources and --github-actions feature.

  • lychee, like hyperlink, is a great choice for obscenely large static sites. Additionally it can check external/outbound links. An invocation of lychee --offline public/ is more or less equivalent to hyperlink public/.

  • liche seems to be fairly fast, but is unmaintained.

  • htmltest seems to be fairly fast as well, and is more of a general-purpose HTML linting tool.

  • muffet seems to have similar performance as htmltest. We tested muffet with http-server and webfsd without noticing a change in timings.

  • linkcheck is faster than linkchecker but still quite slow on large sites.

    We tried linkcheck together with http-server on localhost, although that does not seem to be the bottleneck at all.

  • wummel/linkchecker seems to be the fairly feature-rich, but was a non-starter due to performance. This applies to other countless link checkers we tried that are not mentioned here.

Testimonials

We use Hyperlink to check for dead links on Graphviz's static-site user documentation, because:

  • Hyperlink is blazingly fast, checking 700 HTML pages in 220ms (default) and 850ms (with --check-anchors).
  • Hyperlink's single-binary release, with no library dependencies, was trivial to integrate into our continuous integration tests.
  • High coverage: Hyperlink immediately spotted over a thousand broken page links within both <a> tags and HTML redirects, and a further 62 broken anchor-links with --check-anchors.
  • Hyperlink's design decision to crawl only static files (avoiding HTTP), avoids test flakiness from network requests, allowing me to confidently block merging if Hyperlink reports an error.

In conclusion, Hyperlink fills the "static site continuous testing" niche really nicely.

-- Mark Hansen, Graphviz documentation maintainer

License

Licensed under the MIT, see ./LICENSE.

More Repositories

1

python-atomicwrites

Powerful Python library for atomic file writes.
Python
315
star
2

html5gum

A WHATWG-compliant HTML5 tokenizer and tag soup parser
Rust
137
star
3

quickenv

An unintrusive environment manager
Rust
124
star
4

mysteryshack

A remoteStorage-server
Rust
112
star
5

rust-atomicwrites

Atomic file-writes.
Rust
81
star
6

spacemod

A easy to understand and powerful text search-and-replace tool
Rust
39
star
7

mastodon-bookmark-rss

A small app to let you connect your mastodon bookmarks to your RSS reader.
Rust
22
star
8

script-macro

Write simple proc-macros inline with other source code.
Rust
19
star
9

pytest-subtesthack

A hack to explicitly set up and tear down fixtures.
Python
17
star
10

shippai

Use Rust failures as Python exceptions
Python
16
star
11

rust-vobject

VObject parser and generator for Rust
Rust
15
star
12

pytest-fixture-typecheck

A pytest plugin to assert type annotations at runtime.
Python
12
star
13

python-sensitive-variables

strip local variables in tracebacks
Python
10
star
14

watdo

ABANDONED -- A task-manager for the command line.
Python
8
star
15

in-app-browser-framebreaker

HTML
6
star
16

taskrs

A tasks app
JavaScript
6
star
17

iron-login

ABANDONED Basic session management in Iron.
Rust
6
star
18

uberspace-deploy-scripts

Some deployment scripts for uberspace.de
Python
3
star
19

sentry-toolz

Python
3
star
20

python-structural-matching-benchmarks

Python
2
star
21

vdir

2
star
22

dotfiles

My dotfiles
Vim Script
2
star
23

rust-webicon

Favicon and apple-touch-icon scraper for Rust
Rust
2
star
24

fdwalk

Rust
2
star
25

memoria

A bad memory "profiler" for production.
Rust
2
star
26

bottom-import-demo

Python
1
star
27

diyrss

A simple feed generator website
Python
1
star
28

quicktype-markdown

Generate Markdown documentation from JSON schema, powered by quicktype
JavaScript
1
star
29

facebook-delete-messages

Userscript for Facebook Messages to replace the archive button with a delete button. Based on http://userscripts.org/scripts/show/106261
JavaScript
1
star
30

aoc2020

Python
1
star
31

firefox2pass

Migrate passwords from Firefox to passwordstore
Python
1
star
32

python-move-semantics

Python
1
star
33

gitgone

Rust
1
star
34

rust-fake-yield

Simple generators in Rust
Rust
1
star
35

untitaker

1
star
36

maildropper.py

Easy to use mail delivery agent
Python
1
star
37

serde-annotated

Rust
1
star