• Stars
    star
    972
  • Rank 45,428 (Top 1.0 %)
  • Language
    Go
  • License
    MIT License
  • Created about 6 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pull out bits of URLs provided on stdin

unfurl

Pull out bits of URLs provided on stdin

Install

If you have Go installed and configured:

â–¶ go install github.com/tomnomnom/unfurl@latest

Otherwise download the latest binary for your platform, extract it and move it to somewhere in your $PATH (e.g. /usr/bin/):

â–¶ wget https://github.com/tomnomnom/unfurl/releases/download/v0.0.1/unfurl-linux-amd64-0.0.1.tgz
â–¶ tar xzf unfurl-linux-amd64-0.0.1.tgz
â–¶ sudo mv unfurl /usr/bin/

Usage

unfurl works with URLs provided on stdin; they might come from a file like this one:

â–¶ cat urls.txt
https://sub.example.com/users?id=123&name=Sam
https://sub.example.com/orgs?org=ExCo#about
http://example.net/about#contact

Domains

You can extract the domains from the URLs with the domains mode:

â–¶ cat urls.txt | unfurl domains
sub.example.com
sub.example.com
example.net

If you don't want to output duplicate values you can use the -u or --unique flag:

â–¶ cat urls.txt | unfurl --unique domains
sub.example.com
example.net

The -u/--unique flag works for all modes.

Apex Domains

You can extract the apex part of the domain (e.g. the example.com in http://sub.example.com) using the apexes mode:

â–¶ cat urls.txt | unfurl -u apexes
example.com
example.net

Paths

â–¶ cat urls.txt | unfurl paths
/users
/orgs
/about

Query String Keys

â–¶ cat urls.txt | unfurl keys
id
name
org

Query String Values

â–¶ cat urls.txt | unfurl values
123
Sam
ExCo

Query String Key/Value Pairs

â–¶ cat urls.txt | unfurl keypairs
id=123
name=Sam
org=ExCo

JSON

â–¶ cat urls.txt | unfurl json
{"scheme":"https","opaque":"","user":"","host":"sub.example.com","path":"/users","raw_path":"","raw_query":"id=123\u0026name=Sam","fragment":"","parameters":[{"key":"id","value":"123"},{"key":"name","value":"Sam"}],"url":"https://sub.example.com/users?id=123\u0026name=Sam","domain":"sub.example.com","subdomain":"sub","root":"example","tld":"com","apex":"example.com","port":"","extension":""}
{"scheme":"https","opaque":"","user":"","host":"sub.example.com","path":"/orgs","raw_path":"","raw_query":"org=ExCo","fragment":"about","parameters":[{"key":"org","value":"ExCo"}],"url":"https://sub.example.com/orgs?org=ExCo#about","domain":"sub.example.com","subdomain":"sub","root":"example","tld":"com","apex":"example.com","port":"","extension":""}
{"scheme":"http","opaque":"","user":"","host":"example.net","path":"/about","raw_path":"","raw_query":"","fragment":"contact","parameters":null,"url":"http://example.net/about#contact","domain":"example.net","subdomain":"","root":"example","tld":"net","apex":"example.net","port":"","extension":""}

Custom Formats

You can use the format mode to specify a custom output format:

â–¶ cat urls.txt | unfurl format %d%p
sub.example.com/users
sub.example.com/orgs
example.net/about

The available format directives are:

%%  A literal percent character
%s  The request scheme (e.g. https)
%u  The user info (e.g. user:pass)
%d  The domain (e.g. sub.example.com)
%S  The subdomain (e.g. sub)
%r  The root of domain (e.g. example)
%t  The TLD (e.g. com)
%P  The port (e.g. 8080)
%p  The path (e.g. /users)
%e  The path's file extension (e.g. jpg, html)
%q  The raw query string (e.g. a=1&b=2)
%f  The page fragment (e.g. page-section)
%@  Inserts an @ if user info is specified
%:  Inserts a colon if a port is specified
%?  Inserts a question mark if a query string exists
%#  Inserts a hash if a fragment exists
%a  Authority (alias for %u%@%d%:%P)

Any characters that don't match a format directive remain untouched:

â–¶ cat urls.txt | unfurl -u format "%d (%s)"
sub.example.com (https)
example.net (http)

Note that if a URL does not include the data requested, there will be no output for that URL:

â–¶ echo http://example.com | unfurl format "%P"
â–¶ echo http://example.com:8080 | unfurl format "%P"
8080

Help

â–¶ unfurl -h
Format URLs provided on stdin

Usage:
  unfurl [OPTIONS] [MODE] [FORMATSTRING]

Options:
  -u, --unique   Only output unique values
  -v, --verbose  Verbose mode (output URL parse errors)

Modes:
  keys     Keys from the query string (one per line)
  values   Values from the query string (one per line)
  keypairs Key=value pairs from the query string (one per line)
  domains  The hostname (e.g. sub.example.com)
  paths    The request path (e.g. /users)
  apexes   The apex domain (e.g. example.com from sub.example.com)
  json     JSON encoded url/format objects
  format   Specify a custom format (see below)

Format Directives:
  %%  A literal percent character
  %s  The request scheme (e.g. https)
  %u  The user info (e.g. user:pass)
  %d  The domain (e.g. sub.example.com)
  %S  The subdomain (e.g. sub)
  %r  The root of domain (e.g. example)
  %t  The TLD (e.g. com)
  %P  The port (e.g. 8080)
  %p  The path (e.g. /users)
  %e  The path's file extension (e.g. jpg, html)
  %q  The raw query string (e.g. a=1&b=2)
  %f  The page fragment (e.g. page-section)
  %@  Inserts an @ if user info is specified
  %:  Inserts a colon if a port is specified
  %?  Inserts a question mark if a query string exists
  %#  Inserts a hash if a fragment exists
  %a  Authority (alias for %u%@%d%:%P)

Examples:
  cat urls.txt | unfurl keys
  cat urls.txt | unfurl format %s://%d%p?%q

More Repositories

1

gron

Make JSON greppable!
Go
13,551
star
2

waybackurls

Fetch all the URLs that the Wayback Machine knows about for a domain
Go
3,189
star
3

assetfinder

Find domains and subdomains related to a given domain
Go
2,818
star
4

httprobe

Take a list of domains and probe for working HTTP and HTTPS servers
Go
2,728
star
5

hacks

A collection of hacks and one-off scripts
Go
2,056
star
6

gf

A wrapper around grep, to help you grep for things
Go
1,655
star
7

meg

Fetch many paths for many hosts - without killing the hosts
Go
1,548
star
8

anew

A tool for adding new lines to files, skipping duplicates
Go
1,229
star
9

qsreplace

Accept URLs on stdin, replace all query string values with a user-supplied value
Go
687
star
10

fff

The Fairly Fast Fetcher. Requests a bunch of URLs provided on stdin fairly quickly.
Go
364
star
11

dotfiles

.vimrc, .bashrc etc
Vim Script
324
star
12

rawhttp

A Go library for making HTTP requests with complete control
Go
116
star
13

linkheader

Golang HTTP Link header parser
Go
90
star
14

burl

A Broken-URL Checker
Go
78
star
15

concurl

Make concurrent requests with the curl command-line tool
Go
70
star
16

comb

Combine the lines from two files in every combination
Go
46
star
17

gahttp

Async / concurrent HTTP requests for Go
Go
45
star
18

blocksort

A tool for sorting blocks of lines
Go
34
star
19

securitytxt

A security.txt parser for Go
Go
31
star
20

vumeter

Little HTML canvas VU meter visualisation
JavaScript
28
star
21

phargs

A toolkit for writing CLI scripts in PHP
PHP
25
star
22

xtermcolor

Golang package and command to convert color.Colour to the nearest xterm/bash/shell color
Go
25
star
23

eater-cpu

Go
23
star
24

ASCIIPoint

An ASCII presentation tool in PHP
PHP
20
star
25

twarch

A Twitter Archive thing
PHP
16
star
26

symwatch

A tool to run a command when the target of a symlink changes
Go
15
star
27

cowtalks

A shell script for running lightning talks with a cow as a compere.
Shell
14
star
28

getgithubrepos

A tool to list the SSH clone URLs for all GitHub repos for a given user
Go
12
star
29

phpsecuritytxt

A security.txt parser for PHP
PHP
11
star
30

sheep

I can't draw
JavaScript
10
star
31

globwatch

Golang package to watch a glob pattern for changes.
Go
10
star
32

tomnomnom.github.io

GitHub Pages
HTML
9
star
33

new-stuff-in-php-5.4

For PHP Leeds
PHP
8
star
34

tomnomnom.com

Source for tomnomnom.com
PHP
8
star
35

phpwol

Wake On LAN for PHP
PHP
8
star
36

git-talk

Slides for teaching PlatOps people about git
6
star
37

go-learning

Bits and pieces of Go while I'm learning
Go
6
star
38

Pwas

A webserver written in PHP
PHP
5
star
39

PHP-Evolution-Sim

Basic evolution simulated in PHP
PHP
4
star
40

gotemplate

Template for my Go projects
Shell
4
star
41

leedshack2018

A stack-based VM and assembler, built in 2 days at Leeds Hack 2018
Go
4
star
42

Writing-Testable-PHP

Talk from The Digital Barn
PHP
4
star
43

rplex

A simple general purpose lexer library for Go
Go
4
star
44

fclock

Toy rotating bar clock canvas thing
JavaScript
3
star
45

graphite-client

Really simple plain-text graphite client
JavaScript
3
star
46

api.tomnomnom.com

API shenanigans
PHP
3
star
47

finishingtouchautos.co.uk

Source for finishingtouchautos.co.uk
CSS
3
star
48

tomhudson.co.uk

Source for tomhudson.co.uk
PHP
3
star
49

ansible-play

Playing with ansible
Smarty
3
star
50

build-a-vm-talk

Code and slides from my Hey!Stac talk at the Belgrave in Leeds on the 28th of October 2014.
PHP
2
star
51

crtmas

Merry CRTmas
JavaScript
2
star
52

myfirstwebsite

Test repo plz ignore
PHP
2
star
53

tacho

HTML Canvas Tachometer
JavaScript
2
star
54

flatclass

Flatten deep inheritance trees in PHP to aid debugging
PHP
2
star
55

bouncy.tomnomnom.com

Canvas Bouncy Ball
JavaScript
2
star
56

sbt.org.uk

Source for sbt.org.uk
PHP
2
star
57

branchdemo

CPU Branch Predictor Demo
Go
2
star
58

All-About-SPL

All About SPL talk for LeedsPHP
PHP
2
star
59

readable-code

Slides for Readable Code talk
PHP
1
star
60

victoria-hudson.co.uk

victoria-hudson.co.uk source
PHP
1
star
61

n-things-about-mongo

Slides from LeedsPHP talk on 2013-08-19
1
star
62

numbers.tomnomnom.com

Source for numbers.tomnomnom.com
Go
1
star
63

unit-testing

Unit testing examples
PHP
1
star
64

node-in-production-talk

Node In Production talk slides etc for NodeUpNorth
JavaScript
1
star
65

fixaholic.uk

Source for fixaholic.uk
PHP
1
star
66

rainbow-waves

Canvas Rainbow Waves
JavaScript
1
star
67

Sbt

SBT library of... stuff.
PHP
1
star