Pull out bits of URLs provided on If you have Go installed and configured: Otherwise download the latest binary for your platform,
extract it and move it to somewhere in your unfurl works with URLs provided on stdin; they might come from a file like this one: You can extract the domains from the URLs with the If you don't want to output duplicate values you can use the The You can extract the apex part of the domain (e.g. the You can use the The available format directives are: Any characters that don't match a format directive remain untouched: Note that if a URL does not include the data requested, there will be no output for that URL:stdin
▶ go install github.com/tomnomnom/unfurl@latest
$PATH
(e.g. /usr/bin/
):▶ wget https://github.com/tomnomnom/unfurl/releases/download/v0.0.1/unfurl-linux-amd64-0.0.1.tgz
▶ tar xzf unfurl-linux-amd64-0.0.1.tgz
▶ sudo mv unfurl /usr/bin/
▶ cat urls.txt
https://sub.example.com/users?id=123&name=Sam
https://sub.example.com/orgs?org=ExCo#about
http://example.net/about#contact
domains
mode:▶ cat urls.txt | unfurl domains
sub.example.com
sub.example.com
example.net
-u
or --unique
flag:▶ cat urls.txt | unfurl --unique domains
sub.example.com
example.net
-u
/--unique
flag works for all modes.example.com
in http://sub.example.com
) using the apexes
mode:▶ cat urls.txt | unfurl -u apexes
example.com
example.net
▶ cat urls.txt | unfurl paths
/users
/orgs
/about
▶ cat urls.txt | unfurl keys
id
name
org
▶ cat urls.txt | unfurl values
123
Sam
ExCo
▶ cat urls.txt | unfurl keypairs
id=123
name=Sam
org=ExCo
▶ cat urls.txt | unfurl json
{"scheme":"https","opaque":"","user":"","host":"sub.example.com","path":"/users","raw_path":"","raw_query":"id=123\u0026name=Sam","fragment":"","parameters":[{"key":"id","value":"123"},{"key":"name","value":"Sam"}],"url":"https://sub.example.com/users?id=123\u0026name=Sam","domain":"sub.example.com","subdomain":"sub","root":"example","tld":"com","apex":"example.com","port":"","extension":""}
{"scheme":"https","opaque":"","user":"","host":"sub.example.com","path":"/orgs","raw_path":"","raw_query":"org=ExCo","fragment":"about","parameters":[{"key":"org","value":"ExCo"}],"url":"https://sub.example.com/orgs?org=ExCo#about","domain":"sub.example.com","subdomain":"sub","root":"example","tld":"com","apex":"example.com","port":"","extension":""}
{"scheme":"http","opaque":"","user":"","host":"example.net","path":"/about","raw_path":"","raw_query":"","fragment":"contact","parameters":null,"url":"http://example.net/about#contact","domain":"example.net","subdomain":"","root":"example","tld":"net","apex":"example.net","port":"","extension":""}
format
mode to specify a custom output format:▶ cat urls.txt | unfurl format %d%p
sub.example.com/users
sub.example.com/orgs
example.net/about
%% A literal percent character
%s The request scheme (e.g. https)
%u The user info (e.g. user:pass)
%d The domain (e.g. sub.example.com)
%S The subdomain (e.g. sub)
%r The root of domain (e.g. example)
%t The TLD (e.g. com)
%P The port (e.g. 8080)
%p The path (e.g. /users)
%e The path's file extension (e.g. jpg, html)
%q The raw query string (e.g. a=1&b=2)
%f The page fragment (e.g. page-section)
%@ Inserts an @ if user info is specified
%: Inserts a colon if a port is specified
%? Inserts a question mark if a query string exists
%# Inserts a hash if a fragment exists
%a Authority (alias for %u%@%d%:%P)
▶ cat urls.txt | unfurl -u format "%d (%s)"
sub.example.com (https)
example.net (http)
▶ echo http://example.com | unfurl format "%P"
▶ echo http://example.com:8080 | unfurl format "%P"
8080
▶ unfurl -h
Format URLs provided on stdin
Usage:
unfurl [OPTIONS] [MODE] [FORMATSTRING]
Options:
-u, --unique Only output unique values
-v, --verbose Verbose mode (output URL parse errors)
Modes:
keys Keys from the query string (one per line)
values Values from the query string (one per line)
keypairs Key=value pairs from the query string (one per line)
domains The hostname (e.g. sub.example.com)
paths The request path (e.g. /users)
apexes The apex domain (e.g. example.com from sub.example.com)
json JSON encoded url/format objects
format Specify a custom format (see below)
Format Directives:
%% A literal percent character
%s The request scheme (e.g. https)
%u The user info (e.g. user:pass)
%d The domain (e.g. sub.example.com)
%S The subdomain (e.g. sub)
%r The root of domain (e.g. example)
%t The TLD (e.g. com)
%P The port (e.g. 8080)
%p The path (e.g. /users)
%e The path's file extension (e.g. jpg, html)
%q The raw query string (e.g. a=1&b=2)
%f The page fragment (e.g. page-section)
%@ Inserts an @ if user info is specified
%: Inserts a colon if a port is specified
%? Inserts a question mark if a query string exists
%# Inserts a hash if a fragment exists
%a Authority (alias for %u%@%d%:%P)
Examples:
cat urls.txt | unfurl keys
cat urls.txt | unfurl format %s://%d%p?%q