• Stars
    star
    159
  • Rank 235,916 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extracts the top level domain (TLD) from the URL given.

tld

Extract the top level domain (TLD) from the URL given. List of TLD names is taken from Public Suffix.

Optionally raises exceptions on non-existing TLDs or silently fails (if fail_silently argument is set to True).

PyPI Version Supported Python versions Build Status Documentation Status MPL-1.1 OR GPL-2.0-only OR LGPL-2.1-or-later Coverage

Prerequisites

  • Python 3.7, 3.8, 3.9, 3.10 or 3.11.

Documentation

Documentation is available on Read the Docs.

Installation

Latest stable version on PyPI:

pip install tld

Or latest stable version from GitHub:

pip install https://github.com/barseghyanartur/tld/archive/stable.tar.gz

Usage examples

In addition to examples below, see the jupyter notebook workbook file.

Get the TLD name as string from the URL given

from tld import get_tld

get_tld("http://www.google.co.uk")
# 'co.uk'

get_tld("http://www.google.idontexist", fail_silently=True)
# None

Get the TLD as an object

from tld import get_tld

res = get_tld("http://some.subdomain.google.co.uk", as_object=True)

res
# 'co.uk'

res.subdomain
# 'some.subdomain'

res.domain
# 'google'

res.tld
# 'co.uk'

res.fld
# 'google.co.uk'

res.parsed_url
# SplitResult(
#     scheme='http',
#     netloc='some.subdomain.google.co.uk',
#     path='',
#     query='',
#     fragment=''
# )

Get TLD name, ignoring the missing protocol

from tld import get_tld, get_fld

get_tld("www.google.co.uk", fix_protocol=True)
# 'co.uk'

get_fld("www.google.co.uk", fix_protocol=True)
# 'google.co.uk'

Return TLD parts as tuple

from tld import parse_tld

parse_tld('http://www.google.com')
# 'com', 'google', 'www'

Get the first level domain name as string from the URL given

from tld import get_fld

get_fld("http://www.google.co.uk")
# 'google.co.uk'

get_fld("http://www.google.idontexist", fail_silently=True)
# None

Check if some tld is a valid tld

from tld import is_tld

is_tld('co.uk)
# True

is_tld('uk')
# True

is_tld('tld.doesnotexist')
# False

is_tld('www.google.com')
# False

Update the list of TLD names

To update/sync the tld names with the most recent versions run the following from your terminal:

update-tld-names

Or simply do:

from tld.utils import update_tld_names

update_tld_names()

Note, that this will update all registered TLD source parsers (not only the list of TLD names taken from Mozilla). In order to run the update for a single parser, append uid of that parser as argument.

update-tld-names mozilla

Custom TLD parsers

By default list of TLD names is taken from Mozilla. Parsing implemented in the tld.utils.MozillaTLDSourceParser class. If you want to use another parser, subclass the tld.base.BaseTLDSourceParser, provide uid, source_url, local_path and implement the get_tld_names method. Take the tld.utils.MozillaTLDSourceParser as a good example of such implementation. You could then use get_tld (as well as other tld module functions) as shown below:

from tld import get_tld
from some.module import CustomTLDSourceParser

get_tld(
    "http://www.google.co.uk",
    parser_class=CustomTLDSourceParser
)

Custom list of TLD names

You could maintain your own custom version of the TLD names list (even multiple ones) and use them simultaneously with built in TLD names list.

You would then store them locally and provide a path to it as shown below:

from tld import get_tld
from tld.utils import BaseMozillaTLDSourceParser

class CustomBaseMozillaTLDSourceParser(BaseMozillaTLDSourceParser):

    uid: str = 'custom_mozilla'
    local_path: str = 'tests/res/effective_tld_names_custom.dat.txt'

get_tld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)
# 'foreverchild'

Same goes for first level domain names:

from tld import get_fld

get_fld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)
# 'www.foreverchild'

Note, that in both examples shown above, there the original TLD names file has been modified in the following way:

...
// ===BEGIN ICANN DOMAINS===

// This one actually does not exist, added for testing purposes
foreverchild
...

Free up resources

To free up memory occupied by loading of custom TLD names, use reset_tld_names function with tld_names_local_path parameter.

from tld import get_tld, reset_tld_names

# Get TLD from a custom TLD names parser
get_tld(
    "http://www.foreverchild",
    parser_class=CustomBaseMozillaTLDSourceParser
)

# Free resources occupied by the custom TLD names list
reset_tld_names("tests/res/effective_tld_names_custom.dat.txt")

Troubleshooting

If somehow domain names listed here are not recognised, make sure you have the most recent version of TLD names in your virtual environment:

update-tld-names

To update TLD names list for a single parser, specify it as an argument:

update-tld-names mozilla

Testing

Simply type:

pytest

Or use tox:

tox

Or use tox to check specific env:

tox -e py39

Writing documentation

Keep the following hierarchy.

=====
title
=====

header
======

sub-header
----------

sub-sub-header
~~~~~~~~~~~~~~

sub-sub-sub-header
^^^^^^^^^^^^^^^^^^

sub-sub-sub-sub-header
++++++++++++++++++++++

sub-sub-sub-sub-sub-header
**************************

License

MPL-1.1 OR GPL-2.0-only OR LGPL-2.1-or-later

Support

For security issues contact me at the e-mail given in the Author section.

For overall issues, go to GitHub.

Author

Artur Barseghyan <[email protected]>

More Repositories

1

django-fobi

Form generator/builder application for Django done right: customisable, modular, user- and developer- friendly.
Python
469
star
2

django-dash

Customisable, modular dashboard application framework for Django.
Python
379
star
3

django-elasticsearch-dsl-drf

Integrate Elasticsearch DSL with Django REST framework.
Python
352
star
4

transliterate

Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs.
Python
277
star
5

graphene-elastic

Graphene Elasticsearch/OpenSearch (DSL) integration
Python
69
star
6

django-rest-framework-tricks

Collection of various tricks for Django REST framework.
Python
61
star
7

starbase

DEPRECATED - HBase Stargate (REST API) client wrapper for Python.
Python
53
star
8

django-admin-timeline

A Facebook-like timeline app for Django admin. It's very similar to built-in feature Daily progress, but has nicer templates and infinite scroll. Actions are broken up by day, then by action. It's possible to filter actions by user (multiple select) and content type (multiple select).
Python
52
star
9

ska

Sign data using symmetric-key algorithm encryption. Validate signed data and identify possible validation errors. Uses sha-(1, 224, 256, 385 and 512)/hmac for signature encryption. Custom hash algorithms are allowed. Useful shortcut functions for signing (and validating) dictionaries and URLs.
Python
41
star
10

django-debug-toolbar-force

Show Django Debug Toolbar in non- or partial-HTML views.
Python
13
star
11

django-qartez

The missing XML sitemaps for Django: images sitemap, static sitemaps, alternate hreflang sitemaps.
Python
13
star
12

django-slim

Simple implementation of multi-lingual models for Django. Django-admin integration works out of the box. Supports django-localeurl integration.
Python
13
star
13

django-mongoengine-filter

django-filter for MongoEngine
Python
12
star
14

django-werewolf

DEPRECATED - Item publishing workflow for Django (fully integrated into Django admin).
Python
11
star
15

pif

Public IP address checker.
Python
10
star
16

radar

Generate random date(time) in Python.
Python
10
star
17

django-bleach

Integrate awesome Bleach library into Django
Python
8
star
18

faker-file

Create files with fake data. In many formats. With no efforts.
Python
7
star
19

vishap

Generate embed (HTML) code of services like Youtube or Vimeo from URLs given
Python
5
star
20

the-great-suspender-restore-urls

Restore the broken URLs of the Great Suspender browser (Google Chrome, Firefox) extension.
Python
5
star
21

valuta

Currencies done right.
Python
4
star
22

django-i18next

Additions to Django's i18n module. https://pypi.python.org/pypi/django-i18next
Python
4
star
23

elasticsearch-head-firefox

Elasticsearch head Firefox add-on
JavaScript
3
star
24

django-nine

Version checking library for Django.
Python
3
star
25

the-great-suspender-restore-urls-service

Service for restoring the broken URLs of the Great Suspender browser (Google Chrome, Firefox) extension
Vue
3
star
26

django-nonefield

None field for Django. https://pypi.python.org/pypi/django-nonefield
Python
2
star
27

matyan

Generate change log from Git commits
Python
2
star
28

xinput

Enable/disable xinput devices (for example, a touchpad) from terminal or using the API.
Python
2
star
29

django-dummy-thumbnails

Dummy thumbnails for most popular Django thumbnail generators.
Python
2
star
30

scrapy-mongoengine-item

Scrapy extension to write scraped items using `mongoengine` documents
Python
2
star
31

charmy

DEPRECATED - Automated PyCharm installer for Linux.
Python
2
star
32

rebus

Generate base64-encoded strings consisting of alphanumeric symbols only. https://pypi.python.org/pypi/rebus
Python
1
star
33

django-strawberry

Additional fields for(ever) Django.
Python
1
star
34

django-elasticsearch-dsl-drf-heroku-demo

Heroku demo for django-elasticsearch-dsl-drf
Python
1
star
35

itnpy

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.
Python
1
star
36

graphene-elastic-heroku-demo

Heroku demo for graphene-elastic
Python
1
star
37

elk-stack-container-example

ELK stack (example) with Docker
Python
1
star
38

django-ormex

Django ORM extensions.
Python
1
star
39

anysearch

Compatibility library for smooth support of Elasticsearch and OpenSearch (including both *search and *search-dsl packages)
Python
1
star
40

django-eximagination

A Django template tag library which allows to download external images, store them locally and return the local path to locally stored image to a desired context variable.
Python
1
star