• Stars
    star
    1,095
  • Rank 42,331 (Top 0.9 %)
  • Language
    Python
  • License
    The Unlicense
  • Created over 9 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A robust email syntax and deliverability validation library for Python.

email-validator: Validate Email Addresses

A robust email address syntax and deliverability validation library for Python 3.7+ by Joshua Tauberer.

This library validates that a string is of the form [email protected] and optionally checks that the domain name is set up to receive email. This is the sort of validation you would want when you are identifying users by their email address like on a registration/login form (but not necessarily for composing an email message, see below).

Key features:

  • Checks that an email address has the correct syntax --- good for registration/login forms or other uses related to identifying users.
  • Gives friendly English error messages when validation fails that you can display to end-users.
  • Checks deliverability (optional): Does the domain name resolve? (You can override the default DNS resolver to add query caching.)
  • Supports internationalized domain names and internationalized local parts.
  • Rejects addresses with unsafe Unicode characters, obsolete email address syntax that you'd find unexpected, special use domain names like @localhost, and domains without a dot by default. This is an opinionated library!
  • Normalizes email addresses (important for internationalized and quoted-string addresses! see below).
  • Python type annotations are used.

This is an opinionated library. You should definitely also consider using the less-opinionated pyIsEmail and flanker if they are better for your use case.

Build Status

View the CHANGELOG / Release Notes for the version history of changes in the library. Occasionally this README is ahead of the latest published package --- see the CHANGELOG for details.


Installation

This package is on PyPI, so:

pip install email-validator

(You might need to use pip3 depending on your local environment.)

Quick Start

If you're validating a user's email address before creating a user account in your application, you might do this:

from email_validator import validate_email, EmailNotValidError

email = "[email protected]"

try:

  # Check that the email address is valid. Turn on check_deliverability
  # for first-time validations like on account creation pages (but not
  # login pages).
  emailinfo = validate_email(email, check_deliverability=False)

  # After this point, use only the normalized form of the email address,
  # especially before going to a database query.
  email = emailinfo.normalized

except EmailNotValidError as e:

  # The exception message is human-readable explanation of why it's
  # not a valid (or deliverable) email address.
  print(str(e))

This validates the address and gives you its normalized form. You should put the normalized form in your database and always normalize before checking if an address is in your database. When using this in a login form, set check_deliverability to False to avoid unnecessary DNS queries.

Usage

Overview

The module provides a function validate_email(email_address) which takes an email address and:

  • Raises a EmailNotValidError with a helpful, human-readable error message explaining why the email address is not valid, or
  • Returns an object with a normalized form of the email address (which you should use!) and other information about it.

When an email address is not valid, validate_email raises either an EmailSyntaxError if the form of the address is invalid or an EmailUndeliverableError if the domain name fails DNS checks. Both exception classes are subclasses of EmailNotValidError, which in turn is a subclass of ValueError.

But when an email address is valid, an object is returned containing a normalized form of the email address (which you should use!) and other information.

The validator doesn't, by default, permit obsoleted forms of email addresses that no one uses anymore even though they are still valid and deliverable, since they will probably give you grief if you're using email for login. (See later in the document about how to allow some obsolete forms.)

The validator optionally checks that the domain name in the email address has a DNS MX record indicating that it can receive email. (Except a Null MX record. If there is no MX record, a fallback A/AAAA-record is permitted, unless a reject-all SPF record is present.) DNS is slow and sometimes unavailable or unreliable, so consider whether these checks are useful for your use case and turn them off if they aren't. There is nothing to be gained by trying to actually contact an SMTP server, so that's not done here. For privacy, security, and practicality reasons, servers are good at not giving away whether an address is deliverable or not: email addresses that appear to accept mail at first can bounce mail after a delay, and bounced mail may indicate a temporary failure of a good email address (sometimes an intentional failure, like greylisting).

Options

The validate_email function also accepts the following keyword arguments (defaults are as shown below):

check_deliverability=True: If true, DNS queries are made to check that the domain name in the email address (the part after the @-sign) can receive mail, as described above. Set to False to skip this DNS-based check. It is recommended to pass False when performing validation for login pages (but not account creation pages) since re-validation of a previously validated domain in your database by querying DNS at every login is probably undesirable. You can also set email_validator.CHECK_DELIVERABILITY to False to turn this off for all calls by default.

dns_resolver=None: Pass an instance of dns.resolver.Resolver to control the DNS resolver including setting a timeout and a cache. The caching_resolver function shown below is a helper function to construct a dns.resolver.Resolver with a LRUCache. Reuse the same resolver instance across calls to validate_email to make use of the cache.

test_environment=False: If True, DNS-based deliverability checks are disabled and test and **.test domain names are permitted (see below). You can also set email_validator.TEST_ENVIRONMENT to True to turn it on for all calls by default.

allow_smtputf8=True: Set to False to prohibit internationalized addresses that would require the SMTPUTF8 extension. You can also set email_validator.ALLOW_SMTPUTF8 to False to turn it off for all calls by default.

allow_quoted_local=False: Set to True to allow obscure and potentially problematic email addresses in which the part of the address before the @-sign contains spaces, @-signs, or other surprising characters when the local part is surrounded in quotes (so-called quoted-string local parts). In the object returned by validate_email, the normalized local part removes any unnecessary backslash-escaping and even removes the surrounding quotes if the address would be valid without them. You can also set email_validator.ALLOW_QUOTED_LOCAL to True to turn this on for all calls by default.

allow_domain_literal=False: Set to True to allow bracketed IPv4 and "IPv6:"-prefixd IPv6 addresses in the domain part of the email address. No deliverability checks are performed for these addresses. In the object returned by validate_email, the normalized domain will use the condensed IPv6 format, if applicable. The object's domain_address attribute will hold the parsed ipaddress.IPv4Address or ipaddress.IPv6Address object if applicable. You can also set email_validator.ALLOW_DOMAIN_LITERAL to True to turn this on for all calls by default.

allow_empty_local=False: Set to True to allow an empty local part (i.e. @example.com), e.g. for validating Postfix aliases.

DNS timeout and cache

When validating many email addresses or to control the timeout (the default is 15 seconds), create a caching dns.resolver.Resolver to reuse in each call. The caching_resolver function returns one easily for you:

from email_validator import validate_email, caching_resolver

resolver = caching_resolver(timeout=10)

while True:
  validate_email(email, dns_resolver=resolver)

Test addresses

This library rejects email addresess that use the Special Use Domain Names invalid, localhost, test, and some others by raising EmailSyntaxError. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to localhost (although they might be able to still do so via a malicious MX record). However, in your non-production test environments you may want to use @test or @myname.test email addresses. There are three ways you can allow this:

  1. Add test_environment=True to the call to validate_email (see above).
  2. Set email_validator.TEST_ENVIRONMENT to True globally.
  3. Remove the special-use domain name that you want to use from email_validator.SPECIAL_USE_DOMAIN_NAMES, e.g.:
import email_validator
email_validator.SPECIAL_USE_DOMAIN_NAMES.remove("test")

It is tempting to use @example.com/net/org in tests. They are not in this library's SPECIAL_USE_DOMAIN_NAMES list so you can, but shouldn't, use them. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will nevertheless reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or @test or @myname.test instead.

Internationalized email addresses

The email protocol SMTP and the domain name system DNS have historically only allowed English (ASCII) characters in email addresses and domain names, respectively. Each has adapted to internationalization in a separate way, creating two separate aspects to email address internationalization.

Internationalized domain names (IDN)

The first is internationalized domain names (RFC 5891), a.k.a IDNA 2008. The DNS system has not been updated with Unicode support. Instead, internationalized domain names are converted into a special IDNA ASCII "Punycode" form starting with xn--. When an email address has non-ASCII characters in its domain part, the domain part is replaced with its IDNA ASCII equivalent form in the process of mail transmission. Your mail submission library probably does this for you transparently. (Compliance around the web is not very good though.) This library conforms to IDNA 2008 using the idna module by Kim Davies.

Internationalized local parts

The second sort of internationalization is internationalization in the local part of the address (before the @-sign). In non-internationalized email addresses, only English letters, numbers, and some punctuation (._!#$%&'^``*+-=~/?{|}) are allowed. In internationalized email address local parts, a wider range of Unicode characters are allowed.

A surprisingly large number of Unicode characters are not safe to display, especially when the email address is concatenated with other text, so this library tries to protect you by not permitting resvered, non-, private use, formatting (which can be used to alter the display order of characters), whitespace, and control characters, and combining characters as the first character of the local part and the domain name (so that they cannot combine with something outside of the email address string or with the @-sign). See https://qntm.org/safe and https://trojansource.codes/ for relevant prior work. (Other than whitespace, these are checks that you should be applying to nearly all user inputs in a security-sensitive context.)

These character checks are performed after Unicode normalization (see below), so you are only fully protected if you replace all user-provided email addresses with the normalized email address string returned by this library. This does not guard against the well known problem that many Unicode characters look alike (or are identical), which can be used to fool humans reading displayed text.

Email addresses with these non-ASCII characters require that your mail submission library and the mail servers along the route to the destination, including your own outbound mail server, all support the SMTPUTF8 (RFC 6531) extension. Support for SMTPUTF8 varies. See the allow_smtputf8 parameter.

If you know ahead of time that SMTPUTF8 is not supported by your mail submission stack

By default all internationalized forms are accepted by the validator. But if you know ahead of time that SMTPUTF8 is not supported by your mail submission stack, then you must filter out addresses that require SMTPUTF8 using the allow_smtputf8=False keyword argument (see above). This will cause the validation function to raise a EmailSyntaxError if delivery would require SMTPUTF8. That's just in those cases where non-ASCII characters appear before the @-sign. If you do not set allow_smtputf8=False, you can also check the value of the smtputf8 field in the returned object.

If your mail submission library doesn't support Unicode at all --- even in the domain part of the address --- then immediately prior to mail submission you must replace the email address with its ASCII-ized form. This library gives you back the ASCII-ized form in the ascii_email field in the returned object, which you can get like this:

emailinfo = validate_email(email, allow_smtputf8=False)
email = emailinfo.ascii_email

The local part is left alone (if it has internationalized characters allow_smtputf8=False will force validation to fail) and the domain part is converted to IDNA ASCII. (You probably should not do this at account creation time so you don't change the user's login information without telling them.)

Normalization

Unicode Normalization

The use of Unicode in email addresses introduced a normalization problem. Different Unicode strings can look identical and have the same semantic meaning to the user. The normalized field returned on successful validation provides the correctly normalized form of the given email address.

For example, the CJK fullwidth Latin letters are considered semantically equivalent in domain names to their ASCII counterparts. This library normalizes them to their ASCII counterparts:

emailinfo = validate_email("me@Domain.com")
print(emailinfo.normalized)
print(emailinfo.ascii_email)
# prints "[email protected]" twice

Because an end-user might type their email address in different (but equivalent) un-normalized forms at different times, you ought to replace what they enter with the normalized form immediately prior to going into your database (during account creation), querying your database (during login), or sending outbound mail. Normalization may also change the length of an email address, and this may affect whether it is valid and acceptable by your SMTP provider.

The normalizations include lowercasing the domain part of the email address (domain names are case-insensitive), Unicode "NFC" normalization of the whole address (which turns characters plus combining characters into precomposed characters where possible, replacement of fullwidth and halfwidth characters in the domain part, possibly other UTS46 mappings on the domain part, and conversion from Punycode to Unicode characters.

(See RFC 6532 (internationalized email) section 3.1 and RFC 5895 (IDNA 2008) section 2.)

Other Normalization

Normalization is also applied to quoted-string local parts and domain literal IPv6 addresses if you have allowed them by the allow_quoted_local and allow_domain_literal options. In quoted-string local parts, unnecessary backslash escaping is removed and even the surrounding quotes are removed if they are unnecessary. For IPv6 domain literals, the IPv6 address is normalized to condensed form. RFC 2142 also requires lowercase normalization for some specific mailbox names like postmaster@.

Examples

For the email address [email protected], the returned object is:

ValidatedEmail(
  normalized='[email protected]',
  local_part='test',
  domain='joshdata.me',
  ascii_email='[email protected]',
  ascii_local_part='test',
  ascii_domain='joshdata.me',
  smtputf8=False)

For the fictitious but valid address example@ツ.ⓁⒾⒻⒺ, which has an internationalized domain but ASCII local part, the returned object is:

ValidatedEmail(
  normalized='example@ツ.life',
  local_part='example',
  domain='ツ.life',
  ascii_email='[email protected]',
  ascii_local_part='example',
  ascii_domain='xn--bdk.life',
  smtputf8=False)

Note that normalized and other fields provide a normalized form of the email address, domain name, and (in other cases) local part (see earlier discussion of normalization), which you should use in your database.

Calling validate_email with the ASCII form of the above email address, [email protected], returns the exact same information (i.e., the normalized field always will contain Unicode characters, not Punycode).

For the fictitious address [email protected], which has an internationalized local part, the returned object is:

ValidatedEmail(
  normalized='ツ[email protected]',
  local_part='ツ-test',
  domain='joshdata.me',
  ascii_email=None,
  ascii_local_part=None,
  ascii_domain='joshdata.me',
  smtputf8=True)

Now smtputf8 is True and ascii_email is None because the local part of the address is internationalized. The local_part and normalized fields return the normalized form of the address.

Return value

When an email address passes validation, the fields in the returned object are:

Field Value
normalized The normalized form of the email address that you should put in your database. This combines the local_part and domain fields (see below).
ascii_email If set, an ASCII-only form of the normalized email address by replacing the domain part with IDNA Punycode. This field will be present when an ASCII-only form of the email address exists (including if the email address is already ASCII). If the local part of the email address contains internationalized characters, ascii_email will be None. If set, it merely combines ascii_local_part and ascii_domain.
local_part The normalized local part of the given email address (before the @-sign). Normalization includes Unicode NFC normalization and removing unnecessary quoted-string quotes and backslashes. If allow_quoted_local is True and the surrounding quotes are necessary, the quotes will be present in this field.
ascii_local_part If set, the local part, which is composed of ASCII characters only.
domain The canonical internationalized Unicode form of the domain part of the email address. If the returned string contains non-ASCII characters, either the SMTPUTF8 feature of your mail relay will be required to transmit the message or else the email address's domain part must be converted to IDNA ASCII first: Use ascii_domain field instead.
ascii_domain The IDNA Punycode-encoded form of the domain part of the given email address, as it would be transmitted on the wire.
domain_address If domain literals are allowed and if the email address contains one, an ipaddress.IPv4Address or ipaddress.IPv6Address object.
smtputf8 A boolean indicating that the SMTPUTF8 feature of your mail relay will be required to transmit messages to this address because the local part of the address has non-ASCII characters (the local part cannot be IDNA-encoded). If allow_smtputf8=False is passed as an argument, this flag will always be false because an exception is raised if it would have been true.
mx A list of (priority, domain) tuples of MX records specified in the DNS for the domain (see RFC 5321 section 5). May be None if the deliverability check could not be completed because of a temporary issue like a timeout.
mx_fallback_type None if an MX record is found. If no MX records are actually specified in DNS and instead are inferred, through an obsolete mechanism, from A or AAAA records, the value is the type of DNS record used instead (A or AAAA). May be None if the deliverability check could not be completed because of a temporary issue like a timeout.
spf Any SPF record found while checking deliverability. Only set if the SPF record is queried.

Assumptions

By design, this validator does not pass all email addresses that strictly conform to the standards. Many email address forms are obsolete or likely to cause trouble:

  • The validator assumes the email address is intended to be usable on the public Internet. The domain part of the email address must be a resolvable domain name (see the deliverability checks described above). Most Special Use Domain Names and their subdomains, as well as domain names without a ., are rejected as a syntax error (except see the test_environment parameter above).
  • Obsolete email syntaxes are rejected: The unusual "(comment)" syntax is rejected. Extremely old obsolete syntaxes are rejected. Quoted-string local parts and domain-literal addresses are rejected by default, but there are options to allow them (see above). No one uses these forms anymore, and I can't think of any reason why anyone using this library would need to accept them.

Testing

Tests can be run using

pip install -r test_requirements.txt 
make test

Tests run with mocked DNS responses. When adding or changing tests, temporarily turn on the BUILD_MOCKED_DNS_RESPONSE_DATA flag in tests/mocked_dns_responses.py to re-build the database of mocked responses from live queries.

For Project Maintainers

The package is distributed as a universal wheel and as a source package.

To release:

  • Update CHANGELOG.md.
  • Update the version number in email_validator/version.py.
  • Make & push a commit with the new version number and make sure tests pass.
  • Make & push a tag (see command below).
  • Make a release at https://github.com/JoshData/python-email-validator/releases/new.
  • Publish a source and wheel distribution to pypi (see command below).
git tag v$(grep version setup.cfg | sed "s/.*= //")
git push --tags
./release_to_pypi.sh

More Repositories

1

pdf-diff

A PDF comparison utility in Python.
Python
446
star
2

jot

JSON Operational Transformation (JOT)
JavaScript
353
star
3

pdf-redactor

A general purpose PDF text-layer redaction tool for Python 2/3.
Python
183
star
4

convert-outlook-msg-file

Python library to convert Microsoft Outlook .msg files to .eml/MIME message files.
Python
179
star
5

hackathon.guide

A logistics guide to running a successful hackathon.
HTML
176
star
6

rdfabout

Archival. Things I wrote about RDF from the mid-2000's. The validator is no longer maintained, sorry.
109
star
7

fast_diff_match_patch

Python package for Google's diff-match-patch native C++ implementation.
Python
73
star
8

crs-reports-website

The build process for EveryCRSReport.com.
Python
63
star
9

praat-py

From my PhD days: Praat-Py is a custom build of Praat, the computer program used by linguists for doing phonetic analysis on sound files, to allow for scripts to be written in the Python programming language, rather than in Praat's built-in language.
C
61
star
10

xml_diff

Compares two XML documents by diffing their text.
Python
40
star
11

why-use-cartograms

Analysis for a blog post on cartograms.
Python
29
star
12

party-platforms

The 2012 Democratic, Libertarian, and Republican Party platforms, plus every Democratic platform since 1840, cleaned up into nice XML.
26
star
13

parsey-mcparseface-server

[Archive] A simple Python Flask app to run Parsey McParseface.
Python
25
star
14

cmusphinx-alignment-example

How I got cmusphinx's transcript alignment tool to work.
Java
25
star
15

cartogrid

A grid-based cartogram generator.
Python
14
star
16

opengovdata.org

The website opengovdata.org.
CSS
14
star
17

globe-gores

Globe gores, in Javascript.
JavaScript
12
star
18

dc-code-editor

Prototype tool for editing the DC Code.
JavaScript
9
star
19

wmata-track-locations

WMATA Track Geospatial GIS Location Data
Python
9
star
20

dc-code-prototype

Unofficial Code of the District of Columbia in XML, produced under contract with the Council of the District of Columbia. Last updated in 2014.
7
star
21

crs-reports-scraper

Downloads Congressional Research Service (CRS) reports from the CRS.gov website (which is only visible from within the U.S. Capitol computer network).
HTML
7
star
22

thunderbird-spf

Archival: An anti-phishing/anti-spam Mozilla Thunderbird 3 extension for doing Sender Policy Framework (SPF) checks on incoming mail.
JavaScript
7
star
23

semweb-dotnet

Archival: A C#/.NET library for manipulating RDF. No longer in active development.
C#
6
star
24

s-p-500-simulator

Simulates an investor randomly choosing S&P 500 stocks.
Python
6
star
25

historical-state-population-csv

Historical Population of the U.S. States 1900-present in a CSV Spreadsheet
Python
6
star
26

django-annotator-store

A Django backend for okfn/annotator storage.
Python
6
star
27

printable-district-maps

High-resolution, print-quality congressional district maps and an example of loading Open Street Map (OSM) into Postgres.
Python
6
star
28

official.dccode.gov

The future website for https://official.dccode.gov.
Shell
5
star
29

nyc-traffic

An analysis of New York City traffic patterns on the arterial roads.
Python
5
star
30

color-scales

Color Scale Generator Using a Perceptually Valid Color Space
HTML
5
star
31

opengovdata.io

The website for my book, Open Government Data: The Book.
HTML
4
star
32

myhomepage

My (@JoshData's) homepage.
HTML
4
star
33

html5-stub

An HTML5/Bootstrap website template for starting new projects.
HTML
4
star
34

endsecretlaws

This is how I feel about surveillance.
CSS
3
star
35

infinite-tree

An infinite tree.
HTML
3
star
36

dchbx

DCHBX Health Exchange Plans
Python
3
star
37

exclusiveprocess

A simple Python 3 module for ensuring that your code does not execute concurrently in multiple processes, using POSIX file locking.
Python
3
star
38

django-pubmybook

A Django website for publishing a LaTeX book online in HTML.
Python
3
star
39

marcos

A generative model for natural language using a markov chain over syntactic relations, rather than serial order.
Python
3
star
40

wobblegram

A Python module to create a wigglegram, which is a sort of steeographic image, using a "MPO" file as input, which is created by some cameras.
Python
2
star
41

my2012district

The website my2012district.com, which helps U.S. voters find their new 2012 congressional district.
JavaScript
2
star
42

cotaskme

A task list where every task for you also appears "outgoing" on the task list of the person who requested the task. Based on an idea by Matthew Burton.
Python
2
star
43

datastore-loader

Utility script to load tabular data into the CKAN Datastore.
Python
2
star
44

dc.opendataday.org

The website for Open Data Day DC.
HTML
1
star
45

dc-bega-emails

Emails in 2017-2018 retreived through DC FOIA requests related to the Board of Ethics and Government Accountability's Office of Open Government.
1
star
46

JoshData

Config files for my GitHub profile.
1
star
47

apophenia-python

Python
1
star
48

census2000-to-rdf

(Archival) Perl script to turn the 2000 US Census into RDF.
Perl
1
star
49

dc-street-henge

Like Manhattanhenge but for the District of Columbia. For each day of the year identifies DC streets that line up with sunrise or sunset.
Python
1
star
50

battlelibs

A mad libs helper for Battledecks.
1
star
51

django-database-storage-backend

A Django 1.7-1.10 storages backend backed by your existing database.
Python
1
star
52

browser-padlock-guide

A Javascript library to render an example of a browser security padlock.
CSS
1
star
53

arfticle-three

Uhm. Too much time spent on this.
Python
1
star
54

py-fist-pump

Given 3D accelerometer data, compute the frequency of rhythmic motion and predict the next beat
Python
1
star
55

readlet

A bookmarklet that creates a Spritz speed-reading "reticule" for any web page you are viewing.
JavaScript
1
star
56

alexa-transit-times

An Alexa skill for getting the next WMATA Metro rail or bus times for your common trips.
JavaScript
1
star