• Stars
    star
    505
  • Rank 87,373 (Top 2 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created over 10 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Splits a hostname into subdomains, domain and (effective) top-level domains.

parse-domain

Splits a hostname into subdomains, domain and (effective) top-level domains.

Version on NPM Semantically released Monthly downloads on NPM
NPM Bundle size minified NPM Bundle size minified and gzipped
License

Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains. In order to do that parse-domain uses a large list of known top-level domains from publicsuffix.org:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain(
  // This should be a string with basic latin letters only.
  // More information below.
  "www.some.example.co.uk"
);

// Check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
  const { subDomains, domain, topLevelDomains } = parseResult;

  console.log(subDomains); // ["www", "some"]
  console.log(domain); // "example"
  console.log(topLevelDomains); // ["co", "uk"]
} else {
  // Read more about other parseResult types below...
}

This package has been designed for modern Node and browser environments with ECMAScript modules support. It assumes an ES2015 environment with Symbol(), URL() and TextDecoder() globally available. You need to transpile it down to ES5 (e.g. by using Babel) if you need to support older environments.

The list of top-level domains is stored in a trie data structure and serialization format to ensure the fastest lookup and the smallest possible library size.


Installation

npm install parse-domain

Updates

💡 Please note: publicsuffix.org is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of npm publish. In order to get an up-to-date list, you should run npx parse-domain-update everytime you start or build your application. This will download the latest list from https://publicsuffix.org/list/public_suffix_list.dat.


Expected input

⚠️ parseDomain does not parse whole URLs. You should only pass the puny-encoded hostname section of the URL:

❌ Wrong ✅ Correct
https://[email protected]:8080/path?query www.example.com
münchen.de xn--mnchen-3ya.de
食狮.com.cn?query xn--85x722f.com.cn

There is the utility function fromUrl which tries to extract the hostname from a (partial) URL and puny-encodes it:

import { parseDomain, fromUrl } from "parse-domain";

const { subDomains, domain, topLevelDomains } = parseDomain(
  fromUrl("https://www.münchen.de?query")
);

console.log(subDomains); // ["www"]
console.log(domain); // "xn--mnchen-3ya"
console.log(topLevelDomains); // ["de"]

// You can use the 'punycode' NPM package to decode the domain again
import { toUnicode } from "punycode";

console.log(toUnicode(domain)); // "münchen"

fromUrl parses the URL using new URL(). Depending on your target environments you need to make sure that there is a polyfill for it. It's globally available in all modern browsers (no IE) and in Node v10.

Expected output

When parsing a hostname there are 5 possible results:

  • invalid
  • it is an ip address
  • it is formally correct and the domain is
    • reserved
    • not listed in the public suffix list
    • listed in the public suffix list

parseDomain returns a ParseResult with a type property that allows to distinguish these cases.

👉 Invalid domains

The given input is first validated against RFC 3696 (the domain labels are limited to basic latin letters, numbers and hyphens). If the validation fails, parseResult.type will be ParseResultType.Invalid:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain("münchen.de");

console.log(parseResult.type === ParseResultType.Invalid); // true

Check out the API if you need more information about the validation error.

If you don't want the characters to be validated (e.g. because you need to allow underscores in hostnames), there's also a more relaxed validation mode (according to RFC 2181).

import { parseDomain, ParseResultType, Validation } from "parse-domain";

const parseResult = parseDomain("_jabber._tcp.gmail.com", {
  validation: Validation.Lax,
});

console.log(parseResult.type === ParseResultType.Listed); // true

See also #134 for the discussion.

👉 IP addresses

If the given input is an IP address, parseResult.type will be ParseResultType.Ip:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain("192.168.2.1");

console.log(parseResult.type === ParseResultType.Ip); // true
console.log(parseResult.ipVersion); // 4

It's debatable if a library for parsing domains should also accept IP addresses. In fact, you could argue that parseDomain should reject an IP address as invalid. While this is true from a technical point of view, we decided to report IP addresses in a special way because we assume that a lot of people are using this library to make sense from an arbitrary hostname (see #102).

👉 Reserved domains

There are 5 top-level domains that are not listed in the public suffix list but reserved according to RFC 6761 and RFC 6762:

  • localhost
  • local
  • example
  • invalid
  • test

In these cases, parseResult.type will be ParseResultType.Reserved:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain("pecorino.local");

console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(parseResult.labels); // ["pecorino", "local"]

👉 Domains that are not listed

If the given hostname is valid, but not listed in the downloaded public suffix list, parseResult.type will be ParseResultType.NotListed:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain("this.is.not-listed");

console.log(parseResult.type === ParseResultType.NotListed); // true
console.log(parseResult.labels); // ["this", "is", "not-listed"]

If a domain is not listed, it can be caused by an outdated list. Make sure to update the list once in a while.

⚠️ Do not treat parseDomain as authoritative answer. It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.

👉 Effective top-level domains

Technically, the term top-level domain describes the very last domain in a hostname (uk in example.co.uk). Most people, however, use the term top-level domain for the public suffix which is a namespace "under which Internet users can directly register names".

Some examples for public suffixes:

  • com in example.com
  • co.uk in example.co.uk
  • co in example.co
  • but also com.co in example.com.co

If the hostname is listed in the public suffix list, the parseResult.type will be ParseResultType.Listed:

import { parseDomain, ParseResultType } from "parse-domain";

const parseResult = parseDomain("example.co.uk");

console.log(parseResult.type === ParseResultType.Listed); // true
console.log(parseResult.labels); // ["example", "co", "uk"]

Now parseResult will also provide a subDomains, domain and topLevelDomains property:

const { subDomains, domain, topLevelDomains } = parseResult;

console.log(subDomains); // []
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]

👉 Switch over parseResult.type to distinguish between different parse results

We recommend switching over the parseResult.type:

switch (parseResult.type) {
  case ParseResultType.Listed: {
    const { hostname, topLevelDomains } = parseResult;

    console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
    break;
  }
  case ParseResultType.Reserved:
  case ParseResultType.NotListed: {
    const { hostname } = parseResult;

    console.log(`${hostname} is a reserved or unknown domain`);
    break;
  }
  default:
    throw new Error(`${hostname} is an ip address or invalid domain`);
}

⚠️ Effective TLDs !== TLDs acknowledged by ICANN

What's surprising to a lot of people is that the definition of public suffix means that regular user domains can become effective top-level domains:

const { subDomains, domain, topLevelDomains } = parseDomain(
  "parse-domain.github.io"
);

console.log(subDomains); // []
console.log(domain); // "parse-domain"
console.log(topLevelDomains); // ["github", "io"] 🤯

In this case, github.io is nothing else than a private domain name registrar. github.io is the effective top-level domain and browsers are treating it like that (e.g. for setting document.domain).

If you want to deviate from the browser's understanding of a top-level domain and you're only interested in top-level domains acknowledged by ICANN, there's an icann property:

const parseResult = parseDomain("parse-domain.github.io");
const { subDomains, domain, topLevelDomains } = parseResult.icann;

console.log(subDomains); // ["parse-domain"]
console.log(domain); // "github"
console.log(topLevelDomains); // ["io"]

⚠️ domain can also be undefined

const { subDomains, domain, topLevelDomains } = parseDomain("co.uk");

console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // ["co", "uk"]

⚠️ "" is a valid (but reserved) domain

The empty string "" represents the DNS root and is considered to be valid. parseResult.type will be ParseResultType.Reserved in that case:

const { type, subDomains, domain, topLevelDomains } = parseDomain("");

console.log(type === ParseResultType.Reserved); // true
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // []

API

🧩 = JavaScript export
🧬 = TypeScript export

🧩 export parseDomain(hostname: string | typeof NO_HOSTNAME, options?: ParseDomainOptions): ParseResult

Takes a hostname (e.g. "www.example.com") and returns a ParseResult. The hostname must only contain basic latin letters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.

import { parseDomain } from "parse-domain";

const parseResult = parseDomain("www.example.com");

Use Validation.Lax if you want to allow all characters:

import { parseDomain, Validation } from "parse-domain";

const parseResult = parseDomain("_jabber._tcp.gmail.com", {
  validation: Validation.Lax,
});

🧩 export fromUrl(input: string): string | typeof NO_HOSTNAME

Takes a URL-like string and tries to extract the hostname. Requires the global URL constructor to be available on the platform. Returns the NO_HOSTNAME symbol if the input was not a string or the hostname could not be extracted. Take a look at the test suite for some examples. Does not throw an error, even with invalid input.

🧩 export NO_HOSTNAME: unique symbol

NO_HOSTNAME is a symbol that is returned by fromUrl when it was not able to extract a hostname from the given string. When passed to parseDomain, it will always yield a ParseResultInvalid.

🧬 export type ParseDomainOptions

export type ParseDomainOptions = {
  /**
   * If no validation is specified, Validation.Strict will be used.
   **/
  validation?: Validation;
};

🧩 export Validation

An object that holds all possible Validation validation values:

export const Validation = {
  /**
   * Allows any octets as labels
   * but still restricts the length of labels and the overall domain.
   *
   * @see https://www.rfc-editor.org/rfc/rfc2181#section-11
   **/
  Lax: "LAX",

  /**
   * Only allows ASCII letters, digits and hyphens (aka LDH),
   * forbids hyphens at the beginning or end of a label
   * and requires top-level domain names not to be all-numeric.
   *
   * This is the default if no validation is configured.
   *
   * @see https://datatracker.ietf.org/doc/html/rfc3696#section-2
   */
  Strict: "STRICT",
};

🧬 export Validation

This type represents all possible validation values.

🧬 export ParseResult

A ParseResult is either a ParseResultInvalid, ParseResultIp, ParseResultReserved, ParseResultNotListed or ParseResultListed.

All parse results have a type property that is either "INVALID", "IP","RESERVED","NOT_LISTED"or"LISTED". Use the exported ParseResultType to check for the type instead of checking against string literals.

All parse results also have a hostname property that provides access to the sanitized hostname that was passed to parseDomain.

🧩 export ParseResultType

An object that holds all possible ParseResult type values:

const ParseResultType = {
  Invalid: "INVALID",
  Ip: "IP",
  Reserved: "RESERVED",
  NotListed: "NOT_LISTED",
  Listed: "LISTED",
};

🧬 export ParseResultType

This type represents all possible ParseResult type values.

🧬 export ParseResultInvalid

Describes the shape of the parse result that is returned when the given hostname does not adhere to RFC 1034:

  • The hostname is not a string
  • The hostname is longer than 253 characters
  • A domain label is shorter than 1 character
  • A domain label is longer than 63 characters
  • A domain label contains a character that is not a basic latin character, digit or hyphen
type ParseResultInvalid = {
  type: ParseResultType.INVALID;
  hostname: string | typeof NO_HOSTNAME;
  errors: Array<ValidationError>;
};

// Example

{
  type: "INVALID",
  hostname: ".com",
  errors: [...]
}

🧬 export ValidationError

Describes the shape of a validation error as returned by parseDomain

type ValidationError = {
  type: ValidationErrorType;
  message: string;
  column: number;
};

// Example

{
  type: "LABEL_MIN_LENGTH",
  message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
  column: 1,
}

🧩 export ValidationErrorType

An object that holds all possible ValidationError type values:

const ValidationErrorType = {
  NoHostname: "NO_HOSTNAME",
  DomainMaxLength: "DOMAIN_MAX_LENGTH",
  LabelMinLength: "LABEL_MIN_LENGTH",
  LabelMaxLength: "LABEL_MAX_LENGTH",
  LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
  LastLabelInvalid: "LAST_LABEL_INVALID",
};

🧬 export ValidationErrorType

This type represents all possible type values of a ValidationError.

🧬 export ParseResultIp

This type describes the shape of the parse result that is returned when the given hostname was an IPv4 or IPv6 address.

type ParseResultIp = {
  type: ParseResultType.Ip;
  hostname: string;
  ipVersion: 4 | 6;
};

// Example

{
  type: "IP",
  hostname: "192.168.0.1",
  ipVersion: 4
}

According to RFC 3986, IPv6 addresses need to be surrounded by [ and ] in URLs. parseDomain accepts both IPv6 address with and without square brackets:

// Recognized as IPv4 address
parseDomain("192.168.0.1");
// Both are recognized as proper IPv6 addresses
parseDomain("::");
parseDomain("[::]");

🧬 export ParseResultReserved

This type describes the shape of the parse result that is returned when the given hostname

  • is the root domain (the empty string "")
  • belongs to the top-level domain localhost, local, example, invalid or test
type ParseResultReserved = {
  type: ParseResultType.Reserved;
  hostname: string;
  labels: Array<string>;
};

// Example

{
  type: "RESERVED",
  hostname: "pecorino.local",
  labels: ["pecorino", "local"]
}

⚠️ Reserved IPs, such as 127.0.0.1, will not be reported as reserved, but as ParseResultIp. See #117.

🧬 export ParseResultNotListed

Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the downloaded public suffix list.

type ParseResultNotListed = {
  type: ParseResultType.NotListed;
  hostname: string;
  labels: Array<string>;
};

// Example

{
  type: "NOT_LISTED",
  hostname: "this.is.not-listed",
  labels: ["this", "is", "not-listed"]
}

🧬 export ParseResultListed

Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list.

type ParseResultListed = {
  type: ParseResultType.Listed;
  hostname: string;
  labels: Array<string>;
  subDomains: Array<string>;
  domain: string | undefined;
  topLevelDomains: Array<string>;
  icann: {
    subDomains: Array<string>;
    domain: string | undefined;
    topLevelDomains: Array<string>;
  };
};

// Example

{
  type: "LISTED",
  hostname: "parse-domain.github.io",
  labels: ["parse-domain", "github", "io"]
  subDomains: [],
  domain: "parse-domain",
  topLevelDomains: ["github", "io"],
  icann: {
    subDomains: ["parse-domain"],
    domain: "github",
    topLevelDomains: ["io"]
  }
}

License

MIT

Sponsors

More Repositories

1

updtr

Update outdated npm modules with zero pain™
JavaScript
2,224
star
2

phridge

A bridge between node and PhantomJS
JavaScript
519
star
3

markdown-loader

markdown loader for webpack
JavaScript
376
star
4

extract-loader

webpack loader to extract HTML and CSS from the bundle
JavaScript
317
star
5

modernizr-loader

Get your modernizr build bundled with webpack
JavaScript
163
star
6

scriptlint

an enforceable script naming standard for package.json
TypeScript
132
star
7

angular-expressions

Angular expressions as standalone module
JavaScript
94
star
8

unzip-crx

Unzip chrome extension files
JavaScript
52
star
9

nof5

A tool which runs unit tests (based on mocha) if a file has changed on the server in a browser
JavaScript
52
star
10

inspect-loader

Webpack loader designed for loader testing and debugging. Calls a function with the received input.
JavaScript
50
star
11

wasm-image

an image manipulation wrapper, JS API for Rust `image`
JavaScript
41
star
12

unicons

Cross-platform unicode icon toolkit
JavaScript
37
star
13

xunit-file

Basically the same reporter as mocha's xunit reporter, but writes the output in a file.
JavaScript
37
star
14

talks

Everything @peerigon talks about...
JavaScript
35
star
15

scrapegoat

Fetches calendar/event objects from a CalDav server
JavaScript
31
star
16

alamid

Framework for RESTful JavaScript web applications that run both on the server- and clientside.
JavaScript
23
star
17

clockodo

Unofficial JavaScript/TypeScript SDK for Clockodo
TypeScript
20
star
18

erroz

Streamlined errors with descriptive error messages through metadata and error codes
TypeScript
18
star
19

socket.io-session-middleware

share connect/express sessions with socket.io
JavaScript
18
star
20

JavaScript.js

Compiles JavaScript to JavaScript - and that's it™
JavaScript
15
star
21

link-package

No more require("./../../../../../../../../some/other/file.js")
JavaScript
15
star
22

alamid-schema

Extendable mongoose-like schemas for node.js and the browser
JavaScript
14
star
23

dynamic-config

Loads configuration files depending on the given env
JavaScript
14
star
24

legacy-loader

Webpack loader that prevents scripts from extending the window object.
JavaScript
12
star
25

eslint-config-peerigon

Peerigon coding rules as eslint config
JavaScript
11
star
26

webpack-universal-dev-server

JavaScript
10
star
27

sevdesk

💵 Unofficial JavaScript SDK for sevdesk.com
TypeScript
9
star
28

piwik-wrap

A Promise-based wrapper for the Piwik JavaScript Tracking Client providing an enjoyable API
JavaScript
8
star
29

servus.js

Servus as a Service (SaaS)
JavaScript
8
star
30

v8-docs

V8 API Reference Guide generated from the header files
7
star
31

batch-replace

Perform multiple str.replace() with one operation.
JavaScript
7
star
32

value

Convenient high-performance type-checking for JavaScript
JavaScript
6
star
33

github-ssh-keys

Fetch SSH keys from GitHub for given usernames. Optionally save them to `authorized_keys`.
JavaScript
6
star
34

slogan.js

Slogan as a service (SaaS)
JavaScript
6
star
35

svg-spinners

Scalabe loading indicators
JavaScript
6
star
36

alium

Save CLI commands as aliases on a directory basis
TypeScript
5
star
37

mattermost-theme

4
star
38

alamid-class

Easy prototype inheritance.
JavaScript
3
star
39

uberschrift

Magic heading levels for React
TypeScript
3
star
40

alamid-api

Abstracting http/websocket requests
JavaScript
3
star
41

nodeclass

Smart classes for node.js
JavaScript
3
star
42

webpack-demo

This is a small example repo that can be used to demonstrate some webpack features in a "live coding" fashion.
JavaScript
3
star
43

fshelpers

Some helpers for more convenient file system operations in nodeJS.
JavaScript
2
star
44

peerigon-tito-api

_Very simple_ layer to the ti.to api.
JavaScript
2
star
45

metaclass

Provides some basic classes to describe the structure of a software project.
JavaScript
2
star
46

alamid-plugin

Monkey-patch everything™
JavaScript
2
star
47

bananabomb

Provides some REST-API-clients for Twitter, Facebook, etc. in node.js
JavaScript
2
star
48

node2browser

DEPRECATED: Use webpack :)
JavaScript
2
star
49

alamid-api-client

Abstracts transports to a remote API
JavaScript
2
star
50

create-package

A template for a TypeScript package
TypeScript
1
star
51

svstat

a node.js wrapper for daemontools svstat
JavaScript
1
star
52

npm-stats

stats of our npm modules
JavaScript
1
star
53

alamid-sorted-array

Turns an array into a sorted array
JavaScript
1
star
54

react-and-caviar

hackathon!
JavaScript
1
star
55

hsa

1
star
56

turbo-pnpm-prune-git-dependency-issue

Dockerfile
1
star
57

email-i18n

DEPRECATED: E-mail precompiler with support for i18n and optimization via premailer
JavaScript
1
star
58

sharelock

Awesome encryption for social networks
JavaScript
1
star
59

alamid-view

Encapsulates dom nodes as a re-usable component
JavaScript
1
star
60

meetup-bridge

Bridging the meetup API to show events on our webpage - https://peerigon.com
JavaScript
1
star
61

telemetrydeck-vue

A library for using TelemetryDeck in your Vue 3 app
TypeScript
1
star