• Stars
    star
    405
  • Rank 106,656 (Top 3 %)
  • Language
    Dart
  • License
    MIT License
  • Created over 8 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast link checker

linkcheck

Build Status

Very fast link-checking.

linkcheck versus the popular blc tool

Philosophy:

A good utility is custom-made for a job. There are many link checkers out there, but none of them seems to be striving for the following set of goals.

Crawls fast

  • You want to run the link-checker at least before every deploy (on CI or manually). When it takes ages, you're less likely to do so.

  • linkcheck is currently several times faster than blc and all other link checkers that go to at least comparable depth. It is 40 times faster than the only tool that goes to the same depth (linkchecker).

Finds all relevant problems

  • No link-checker can guarantee correct results: the web is too flaky for that. But at least the tool should correctly parse the HTML (not just try to guess what's a URL and what isn't) and the CSS (for url(...) links).

    • PENDING: srcset support
  • linkcheck finds more than linklint and blc. It finds the same amount or more problems than the best alternative, linkchecker.

Leaves out irrelevant problems

  • linkcheck doesn't attempt to render JavaScript. It would make it at least an order of magnitude slower and way more complex. (For example, what links and buttons should the tool attempt to click, and how many times? Should we only click visible links? How exactly do we detect broken links?) Validating SPAs is a very different problem than checking static links, and should be approached by dedicated tools.

  • linkcheck only supports http: and https:. It won't try to check FTP or telnet or nntp links.

    • Note: linkcheck will currently completely ignore unsupported schemes like ftp: or mailto: or data:. This may change in the future to at least show info-level warning.
  • linkcheck doesn't validate file system directories. Servers often behave very differently than file systems, so validating links on the file system often leads to both false positives and false negatives. Links should be checked in their natural habitat, and as close to the production environment as possible. You can (and should) run linkcheck on your localhost server, of course.

Good UX

  • Yes, a command line utility can have good or bad UX. It has mostly to do with giving sane defaults, not forcing users to learn new constructs, not making them type more than needed, and showing concise output.

  • The most frequent use cases should be only a few arguments.

  • linkcheck doesn't throttle itself on localhost.

  • linkcheck follows POSIX CLI standards (no @input and similar constructs like in linklint).

Brief and meaningful output

  • When everything works, you don't want to see a huge list of links.

    • In this scenario, linkcheck just outputs 'Perfect' and some stats on a single line.
  • When things are broken, you want to see where exactly is the problem and you want to have it sorted in a sane way.

    • linkcheck lists broken links by their source URL first so that you can fix many links at once. It also sorts the URLs alphabetically, and shows both the exact location of the link (line:column) and the anchor text (or the tag if it wasn't an anchor).
  • For CI builds, you want non-zero exit code whenever there is a problem.

    • linkcheck returns status code 1 if there are warnings, and status code 2 if there are errors.

It goes without saying that linkcheck fully respects definitions in robots.txt and throttles itself when accessing websites.

Installation

Direct download

  • Download the latest executable from the Releases page on GitHub. Pick the executable for your system (for example, linkcheck-win-x64.exe for a 64-bit machine running Microsoft Windows).

You should be able to immediately run this executable -- it has no external dependencies. For example, assuming you are on macOS and downloaded the file to the default downloads directory, you can go to your Terminal (or iTerm, or SSH) and run ./Downloads/linkcheck-mac-x64.

You can rename the file and move it to any directory. For example, on a Linux box, you might want to rename the executable to simply linkcheck, and move it to /usr/local/bin, $HOME/bin or another directory in your $PATH.

Docker image

Latest executable in a docker image:

docker run --rm tennox/linkcheck --help

(built from a repo mirror by @tennox)

From Source

Step 1. Install Dart

Follow the installation instructions for your platform from the Get the Dart SDK documentation.

For example, on a Mac, assuming you have homebrew, you just run:

$ brew tap dart-lang/dart
$ brew install dart

Step 2. Install linkcheck

Once Dart is installed, run:

$ dart pub global activate linkcheck

Pub installs executables into ~/.pub-cache/bin, which may not be on your path. You can fix that by adding the following to your shell's config file (.bashrc, .bash_profile, etc.):

export PATH="$PATH":"~/.pub-cache/bin"

Then either restart the terminal or run source ~/.bash_profile (assuming ~/.bash_profile is where you put the PATH export above).

Docker

If you have Docker installed, you can build the image and use the container avoiding local Dart installation.

Build

In the project directory, for x86 and x64 architectures, run

docker build -t filiph/linkcheck .

On ARM architectures (Raspberry, M1 Mac), run

docker build --platform linux/arm64 -t filiph/linkcheck .

Usage (container mode)

docker run filiph/linkcheck <URL>

All bellow usage are valid running on container too.

Usage (github action)

uses: filiph/[email protected]
  with:
    arguments: <URL>

All bellow usage guidelines are valid running as a GitHub action too.

Usage

If in doubt, run linkcheck -h. Here are some examples to get you started.

Localhost

Running linkcheck without arguments will try to crawl http://localhost:8080/ (which is the most common local server URL).

  • linkcheck to crawl the site and ignore external links
  • linkcheck -e to try external links

If you run your local server on http://localhost:4000/, for example, you can do:

  • linkcheck :4000 to crawl the site and ignore external links
  • linkcheck :4000 -e to try external links

linkcheck will not throttle itself when accessing localhost. It will go as fast as possible.

Deployed sites

  • linkcheck www.example.com to crawl www.example.com and ignore external links
  • linkcheck https://www.example.com to start directly on https
  • linkcheck www.example.com www.other.com to crawl both sites and check links between the two (but ignore external links outside those two sites)

Many entry points

Assuming you have a text file mysites.txt like this:

http://egamebook.com/
http://filiph.net/
https://alojz.cz/

You can run linkcheck -i mysites.txt and it will crawl all of them and also check links between them. This is useful for:

  1. Link-checking projects spanning many domains (or subdomains).
  2. Checking all your public websites / blogs / etc.

There's another use for this, and that is when you have a list of inbound links, like this:

https://www.dart.dev/
https://www.dart.dev/tools/
https://www.dart.dev/guides/

You probably want to make sure you never break your inbound links. For example, if a page changes URL, the previous URL should still work (redirecting to the new page when appropriate).

Where do you get a list of inbound links? Try your site's sitemap.xml as a starting point, and — additionally — try something like the Google Webmaster Tools’ crawl error page.

Skipping URLs

Sometimes, it is legitimate to ignore some failing URLs. This is done via the --skip-file option.

Let's say you're working on a site and a significant portion of it is currently under construction. You can create a file called my_skip_file.txt, for example, and fill it with regular expressions like so:

# Lines starting with a hash are comments.

admin/
\.s?css$
\#info

The file above includes a comment on line 1 which will be ignored. Line 2 is blank and will be ignored as well. Line 3 contains a broad regular expression that will make linkcheck ignore any link to a URL containing admin/ anywhere in it. Line 4 shows that there is full support for regular expressions – it will ignore URLs ending with .css and .scss. Line 5 shows the only special escape sequence. If you need to start your regular expression with a # (which linkcheck would normally parse as a comment) you can precede the # with a backslash (\). This will force linkcheck not to ignore the line. In this case, the regular expression on line 4 will match #info anywhere in the URL.

To use this file, you run linkcheck like this:

linkcheck example.com --skip-file my_skip_file.txt

Regular expressions are hard. If unsure, use the -d option to see what URLs your skip file is ignoring, exactly.

To use a skipfile while running linkchecker through docker create a directory to use as a volume in docker and put your skip file in it. Then use a command similar to the following (assuming the folder was named skipfiles):

docker run -v "$(pwd)/skipfiles/:/skipfiles/" filiph/linkcheck http://example.com/ --skip-file /skipfiles/skipfile.txt

User agent

The tool identifies itself to servers with the following user agent string:

linkcheck tool (https://github.com/filiph/linkcheck)

Releasing a new version

  1. Commit all your changes, including updates to CHANGELOG, and including updating the version number in pubspec.yaml and lib/linkcheck.dart. Let's say your new version number is 3.4.56. That number should be reflected in all three files.
  2. Tag the last commit with the same version number. In our case, it would be 3.4.56.
  3. Push to master.

This will run the GitHub Actions script in .github/workflows/release.yml, building binaries and placing a new release into github.com/filiph/linkcheck/releases.

In order to populate it to the GitHub Actions Marketplace as well, it's currently required to manually Edit and hit Update release on the release page once. No changes needed. (Source: GitHub Community)

More Repositories

1

state_experiments

Companion repository to the "Build reactive mobile apps in Flutter" talk
Dart
909
star
2

hn_app

The HN reader app developed live on The Boring Flutter Development Show
Dart
487
star
3

tictactoe

Dart
426
star
4

english_words

Utilities for working with English words
Dart
206
star
5

egamebook

A procedural sword & sorcery adventure
Dart
192
star
6

unsure

Unsure Calculator
Dart
111
star
7

conference_darwin

A library for building conference schedules using a genetic algorithm.
Dart
97
star
8

scifi_ui

Taking Flutter's custom UI capabilities to movie UI territory
Dart
89
star
9

dartbin

DEPRECATED: use https://dart.dev/tools/dart2native instead
Dart
60
star
10

little_things

Code for the app in https://www.youtube.com/watch?v=MIepaf7ks40
Dart
57
star
11

selfimproving-dev

The Self-Improving Developer book.
HTML
49
star
12

edgehead

MOVED:
44
star
13

html_unescape

Dart library for unescaping HTML-encoded strings
Dart
40
star
14

darwin

A genetic algorithm library for Dart.
Dart
37
star
15

markov

Markov chain generator in Dart
Dart
35
star
16

filiphnet

Filip Hracek's personal homepage
HTML
30
star
17

human-life

An app for visualizing human life in weeks.
Dart
28
star
18

progress_bar

JavaScript
26
star
19

win95_flutter

A silly experiment with rendering Windows95-like UI in Flutter
Dart
25
star
20

cli_menu

A simple menu (picker) that works on the command line.
Dart
23
star
21

sloth_app

Intentionally slow Flutter app
Dart
23
star
22

fuzzylogic

Fuzzy logic module for Dart.
Dart
22
star
23

flutter_template_maker

A tool for automatically creating Flutter templates from existing Flutter projects.
Dart
17
star
24

prime_finder

Find primes that include meaningful numbers
Dart
17
star
25

observable_flutter_cube

Dart
16
star
26

adaptive_boring

Dart
15
star
27

game_template

Dart
14
star
28

bespoke

Dart
14
star
29

extension_methods_playground

A few samples use cases of Dart extension methods.
Dart
14
star
30

benchmarkhor

Benchmark comparison tool.
Dart
12
star
31

youtube_subs

Prettifies autogenerated subtitles
Dart
11
star
32

sample_downloader

A command-line tool for easy extraction of Dart & Flutter projects from GitHub repositories.
Dart
11
star
33

slot_machine

Dart
10
star
34

game_benchmarks

ShaderLab
10
star
35

t_stats

A Dart library for statistics.
Dart
9
star
36

tmzns

Automatically exported from code.google.com/p/tmzns
JavaScript
9
star
37

wnews

A tiny command-line news reader
Dart
9
star
38

ink-dart

Dart wrapper around Ink.
Dart
7
star
39

startup_namer

An AngularDart web app for naming startups.
Dart
7
star
40

jsonml

JsonML for Dart
Dart
6
star
41

spaceship_combat

Dart
6
star
42

service_worker_helper

Dart
5
star
43

simpleverse

A simple program that is procedurally generating short poems
Dart
4
star
44

startup_namer_flutter

Dart
4
star
45

codelab-experiment

C++
4
star
46

grayshore

Grayshore is an old-school fantasy setting for roleplaying games (like Dungeons & Dragons), computer games and whatever else that needs a consistent world.
4
star
47

gtag_analytics

A minimal Google Analytics interop library for Dart.
Dart
3
star
48

zombie

Dart
3
star
49

goap

Goal oriented action planning in Dart.
Dart
3
star
50

construct3-standalone

A simple Electron wrapper around Construct 3 web app.
JavaScript
2
star
51

another

Another Death of Art: an evolutionary algorithm art project
Python
2
star
52

flame-collision-benchmark

Dart
2
star
53

lochness

Ludum Dare game
Dart
2
star
54

timeline

A simple web-based tool for creating SVG timelines
Dart
2
star
55

devfest_cz

Dart
2
star
56

dart-logo-neural-pilot

Dart
2
star
57

dart-graphml

DEPRECATED: Please do not use this code
Dart
1
star
58

reddit_crawler

A simple command line tool that gets all Reddit submissions about a particular technology
Dart
1
star
59

bodega

Dart
1
star
60

herding-game

Automatically exported from code.google.com/p/herding-game
JavaScript
1
star
61

egamebook_server

Dart
1
star
62

dart-xml

A lightweight XML library for Dart.
Dart
1
star
63

components_codelab

Dart
1
star
64

georgeous

This is a 2 year old exercise in Android app design. Nothing much to see here, really.
Java
1
star
65

text_interpolate

Minimal Dart library for random interpolation between two Strings. To be used for 'animation' - tweening between texts.
Dart
1
star
66

measure

Tools for measuring some performance metrics.
Dart
1
star