• Stars
    star
    393
  • Rank 106,915 (Top 3 %)
  • Language
    Python
  • License
    Do What The F*ck ...
  • Created over 11 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A python module for retrieving and parsing WHOIS data

pythonwhois

A WHOIS retrieval and parsing library for Python.

Dependencies

None! All you need is the Python standard library.

Instructions

The manual (including install instructions) can be found in the doc/ directory. A HTML version is also viewable here.

Goals

  • 100% coverage of WHOIS formats.
  • Accurate and complete data.
  • Consistently functional parsing; constant tests to ensure the parser isn't accidentally broken.

Features

  • WHOIS data retrieval
    • Able to follow WHOIS server redirects
    • Won't get stuck on multiple-result responses from verisign-grs
  • WHOIS data parsing
    • Base information (registrar, etc.)
    • Dates/times (registration, expiry, ...)
    • Full registrant information (!)
    • Nameservers
  • Optional WHOIS data normalization
    • Attempts to intelligently reformat WHOIS data for better (human) readability
    • Converts various abbreviation types to full locality names
      • Airport codes
      • Country names (2- and 3-letter ISO codes)
      • US states and territories
      • Canadian states and territories
      • Australian states
  • pwhois, a simple WHOIS tool using pythonwhois
    • Easily readable output format
    • Can also output raw WHOIS data
    • ... and JSON.
  • Automated testing suite
    • Will detect and warn about any changes in parsed data compared to previous runs
    • Guarantees that previously working WHOIS parsing doesn't unintentionally break when changing code

IP range WHOIS

pythonwhois does not yet support WHOIS lookups on IP ranges (including single IPs), although this will be added at some point in the future. In the meantime, consider using ipwhois - it offers functionality and an API similar to pythonwhois, but for IPs. It also supports delegated RWhois.

Do note that ipwhois does not offer a normalization feature, and does not (yet) come with a command-line tool. Additionally, ipwhois is maintained by Philip Hane and not by me; please make sure to file bugs relating to it in the ipwhois repository, not in that of pythonwhois.

Important update notes

2.4.0 and up: A lot of changes were made to the normalization, and the performance under Python 2.x was significantly improved. The average parsing time under Python 2.7 has dropped by 94% (!), and on my system averages out at 18ms. Performance under Python 3.x is unchanged. pythonwhois will now expand a lot of abbreviations in normalized mode, such as airport codes, ISO country codes, and US/CA/AU state abbreviations. The consequence of this is that the library is now bigger (as it ships a list of these abbreviations). Also note that there may be licensing consequences, in particular regarding the airport code database. More information about that can be found below.

2.3.0 and up: Python 3 support was fixed. Creation date parsing for contacts was fixed; correct timestamps will now be returned, rather than unformatted ones - if your application relies on the broken variant, you'll need to change your code. Some additional parameters were added to the net and parse methods to facilitate NIC handle lookups; the defaults are backwards-compatible, and these changes should not have any consequences for your code. Thai WHOIS parsing was implemented, but is a little spotty - data may occasionally be incorrectly split up. Please submit a bug report if you run across any issues.

2.2.0 and up: The internal workings of get_whois_raw have been changed, to better facilitate parsing of WHOIS data from registries that may return multiple partial matches for a query, such as whois.verisign-grs.com. This change means that, by default, get_whois_raw will now strip out the part of such a response that does not pertain directly to the requested domain. If your application requires an unmodified raw WHOIS response and is calling get_whois_raw directly, you should use the new never_cut parameter to keep pythonwhois from doing this post-processing. As this is a potentially breaking behaviour change, the minor version has been bumped.

It doesn't work!

  • It doesn't work at all?
  • It doesn't parse the data for a particular domain?
  • There's an inaccuracy in parsing the data for a domain, even just a small one?

If any of those apply, don't hesitate to file an issue! The goal is 100% coverage, and we need your feedback to reach that goal.

License

This library may be used under the WTFPL - or, if you take issue with that, consider it to be under the CC0.

Data sources

This library uses a number of third-party datasets for normalization:

Be aware that the OpenFlights database in particular has potential licensing consequences; if you do not wish to be bound by these potential consequences, you may simply delete the airports.dat file from your distribution. pythonwhois will assume there is no database available, and will not perform airport code conversion (but still function correctly otherwise). This also applies to other included datasets.

Contributing

Feel free to fork and submit pull requests (to the develop branch)! If you change any parsing or normalization logic, ensure to run the full test suite before opening a pull request. Instructions for that are below.

Please note that this project uses tabs for indentation.

All commands are relative to the root directory of the repository.

Pull requests that do not include output from test.py will be rejected!

Adding new WHOIS data to the testing set

pwhois --raw thedomain.com > test/data/thedomain.com

Checking the currently parsed data (while editing the parser)

./pwhois -f test/data/thedomain.com/ .

(don't forget the dot at the end!)

Marking the current parsed data as correct for a domain

Make sure to verify (using pwhois or otherwise) that the WHOIS data for the domain is being parsed correctly, before marking it as correct!

./test.py update thedomain.com

Running all tests

./test.py run all

Testing a specific domain

./test.py run thedomain.com

Running the full test suite including support for multiple python versions

tox

Generating documentation

You need ZippyDoc (which can be installed through pip install zippydoc).

zpy2html doc/*.zpy

More Repositories

1

pdfy

The platform behind PDFy, a free instant PDF host.
JavaScript
195
star
2

node-random-number-csprng

A cryptographically secure generator for random numbers in a range.
JavaScript
83
star
3

node-bhttp

A sane HTTP client library for Node.js with Streams2 support.
CoffeeScript
62
star
4

scrypt-for-humans

A human-friendly API wrapper for the Node.js Scrypt bindings.
JavaScript
43
star
5

anonnews2

The source code for the current http://anonnews.org/ (AnonNews 2.0) The current rewrite, version 3.0, will live in a separate repository.
PHP
21
star
6

todo

A todo web-app for overworked hackers.
PHP
14
star
7

node-promise-task-queue

A configurable task queue that supports Promises.
JavaScript
12
star
8

node-combined-stream2

A drop-in Streams2-compatible replacement for combined-stream.
JavaScript
11
star
9

pdfy2

Node.js port of PDFy.
Java
10
star
10

cvm

Free and open-source VPS panel
CSS
9
star
11

cphp

An intuitive PHP framework that can be learned in 60 minutes. (UNMAINTAINED)
PHP
8
star
12

emailparser

Parser and HTML renderer for .eml files.
Python
7
star
13

node-stream-length

For a given Buffer or Stream, this module will attempt to determine the total length of the stream contents.
CoffeeScript
7
star
14

node-cdx

A streaming CDX parser for Node.js.
CoffeeScript
6
star
15

hma-proxy-parse

Parses proxies out of HideMyAss' public proxy list.
JavaScript
6
star
16

pytahoe

Python module for working with the Tahoe-LAFS filesystem.
Python
5
star
17

BlueCP

A web hosting panel based on PHP.
JavaScript
5
star
18

image-disc

A small utility to batch-create archived images of CDs/DVDs on Linux.
Python
5
star
19

tahoe-tools

Assorted tools for Tahoe-LAFS.
Python
4
star
20

openNG

An Open-Source Node Graph-Style Intelligence Platform
4
star
21

filething

A thin light-weight wrapper library, to make filesystem operations in Python suck less.
Python
3
star
22

nzbspider

A simple tool to automatically download NZBs for given releases
Python
3
star
23

resolv

Python module for resolving URLs of streaming sites and filehosters to direct download URLs.
Python
3
star
24

catarc

Commandline tool for extracting various types of archive to stdout, for example to grep through them.
Python
3
star
25

vpslist

Web application for comparing VPS providers on various specifications, allowing for filtering and sorting.
PHP
3
star
26

lighttpdparse

A simple script for getting the top statistics from one or more lighttpd access log files.
Python
3
star
27

pysfx

Tool to build self-extracting Python scripts.
Python
3
star
28

scantools

An assortment of tools for scanning books.
Python
3
star
29

pastebin-scrape

A resilient Pastebin.com scraper.
3
star
30

python-docs

An effort to rewrite the Python documentation to be more useful.
2
star
31

node-get-exchange-rates

Exchange rates for Bitcoin and major national currencies. No API key required.
JavaScript
2
star
32

beautifulsoup

My fork of BeautifulSoup, primarily to add more support for CSS selectors. Supports basic :nth-of-type() pseudoselectors and non-tag selectors as direct descendants.
Python
2
star
33

crytoteam

A project management platform for non-profit projects.
PHP
2
star
34

tahoe-s3

An S3 frontend and self-healing mechanism for Tahoe-LAFS.
2
star
35

gmhost

A simple Tahoe-LAFS based filehosting service.
PHP
2
star
36

multiloggy

A fork of Sean B. Palmer's public IRC logging bot 'loggy', implementing multi-channel support.
Python
2
star
37

scraperscript

A bookmarklet that helps you find unique selectors for page elements.
JavaScript
2
star
38

circd

[unmaintained] Simple IRCd in Python.
Python
2
star
39

node-bitmask-flags

A utility for dealing with flags and permissions using bitmasks.
JavaScript
2
star
40

webshots

Crawling tools for Webshots archiving.
Python
2
star
41

4chandownloader

[unmaintained] Simple tool to download a 4chan thread.
Python
2
star
42

pyreactor

A simple evented networking library for Python, designed for easy creation of custom protocols.
Python
1
star
43

node-bluebird-tap-error

Like .tap, but for errors (rejections)
JavaScript
1
star
44

node-promise-while-loop

An asynchronous while-loop implementation with full support for Promises
JavaScript
1
star
45

jsde

Fully client-side 'desktop environment' in Javascript, intended for data processing web applications.
JavaScript
1
star
46

zippydoc

Documentation markup language and library, including HTML converter.
Python
1
star
47

radium

Light-weight batteries-included JavaScript game engine.
JavaScript
1
star
48

openmedia

A media catalogue web-application for open-licensed video content.
JavaScript
1
star
49

crytobooks

Ebook crawler and search engine.
PHP
1
star
50

hypervm-migrate

A HyperVM mass migration script. Use at own risk.
Python
1
star
51

joepie91

1
star
52

cryto-status

An extensible, real-time, master/slave server and service monitoring system using ZeroMQ.
Python
1
star
53

box

The forum and blog software for the (work-in-progress) BoxOnABudget site.
PHP
1
star
54

node-promisify-simple-callback

Promisifies a function that expects a callback with a single (result) argument
JavaScript
1
star
55

node-form-data2

A Streams2-compatible drop-in replacement for the `form-data` module.
JavaScript
1
star
56

main

Assorted small stuff.
Python
1
star
57

multipaste

Commandline tool for pasting the same content to multiple pastebins.
Python
1
star