• Stars
    star
    2,445
  • Rank 18,080 (Top 0.4 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 9 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

python parser for human readable dates


Dateparser

Python parser for human readable dates

PyPI - Downloads PypI - Version Code Coverage Github - Build Readthedocs - Docs

Key FeaturesHow To UseInstallationCommon use casesYou may also like...License

Key Features

  • Support for almost every existing date format: absolute dates, relative dates ("two weeks ago" or "tomorrow"), timestamps, etc.
  • Support for more than 200 language locales.
  • Language autodetection
  • Customizable behavior through settings.
  • Support for non-Gregorian calendar systems.
  • Support for dates with timezones abbreviations or UTC offsets ("August 14, 2015 EST", "21 July 2013 10:15 pm +0500"...)
  • Search dates in longer texts.

Online demo

Do you want to try it out without installing any dependency? Now you can test it quickly by visiting this online demo!

How To Use

The most straightforward way to parse dates with dateparser is to use the dateparser.parse() function, that wraps around most of the functionality of the module.

>>> import dateparser

>>> dateparser.parse('Fri, 12 Dec 2014 10:55:50')
datetime.datetime(2014, 12, 12, 10, 55, 50)

>>> dateparser.parse('1991-05-17')
datetime.datetime(1991, 5, 17, 0, 0)

>>> dateparser.parse('In two months')  # today is 1st Aug 2020
datetime.datetime(2020, 10, 1, 11, 12, 27, 764201)

>>> dateparser.parse('1484823450')  # timestamp
datetime.datetime(2017, 1, 19, 10, 57, 30)

>>> dateparser.parse('January 12, 2012 10:00 PM EST')
datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<StaticTzInfo 'EST'>)

As you can see, dateparser works with different date formats, but it can also be used directly with strings in different languages:

>>> dateparser.parse('Martes 21 de Octubre de 2014')  # Spanish (Tuesday 21 October 2014)
datetime.datetime(2014, 10, 21, 0, 0)

>>> dateparser.parse('Le 11 Décembre 2014 à 09:00')  # French (11 December 2014 at 09:00)
datetime.datetime(2014, 12, 11, 9, 0)

>>> dateparser.parse('13 января 2015 г. в 13:34')  # Russian (13 January 2015 at 13:34)
datetime.datetime(2015, 1, 13, 13, 34)

>>> dateparser.parse('1 เดือนตุลาคม 2005, 1:00 AM')  # Thai (1 October 2005, 1:00 AM)
datetime.datetime(2005, 10, 1, 1, 0)

>>> dateparser.parse('yaklaşık 23 saat önce')  # Turkish (23 hours ago), current time: 12:46
datetime.datetime(2019, 9, 7, 13, 46)

>>> dateparser.parse('2小时前')  # Chinese (2 hours ago), current time: 22:30
datetime.datetime(2018, 5, 31, 20, 30)

You can control multiple behaviors by using the settings parameter:

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YMD'})
datetime.datetime(2014, 10, 12, 0, 0)

>>> dateparser.parse('2014-10-12', settings={'DATE_ORDER': 'YDM'})
datetime.datetime(2014, 12, 10, 0, 0)

>>> dateparser.parse('1 year', settings={'PREFER_DATES_FROM': 'future'})  # Today is 2020-09-23
datetime.datetime(2021, 9, 23, 0, 0)

>>> dateparser.parse('tomorrow', settings={'RELATIVE_BASE': datetime.datetime(1992, 1, 1)})
datetime.datetime(1992, 1, 2, 0, 0)

To see more examples on how to use the settings, check the settings section in the docs.

False positives

dateparser will do its best to return a date, dealing with multiple formats and different locales. For that reason it is important that the input is a valid date, otherwise it could return false positives.

To reduce the possibility of receiving false positives, make sure that:

  • The input string it's a valid date and it doesn't contain any other words or numbers.
  • If you know the language or languages beforehand you add them through the languages or locales properties.

On the other hand, if you want to exclude any of the default parsers (timestamp, relative-time...) or change the order in which they are executed, you can do so through the settings PARSERS.

Installation

Dateparser supports Python >= 3.7. You can install it by doing:

$ pip install dateparser

If you want to use the jalali or hijri calendar, you need to install the calendars extra:

$ pip install dateparser[calendars]

Common use cases

dateparser can be used with a really different number of purposes, but it stands out when it comes to:

Consuming data from different sources:

  • Scraping: extract dates from different places with several different formats and languages
  • IoT: consuming data coming from different sources with different date formats
  • Tooling: consuming dates from different logs / sources
  • Format transformations: when transforming dates coming from different files (PDF, CSV, etc.) to other formats (database, etc).

Offering natural interaction with users:

  • Tooling and CLI: allow users to write “3 days ago” to retrieve information.
  • Search engine: allow people to search by date in an easiest / natural format.
  • Bots: allow users to interact with a bot easily

You may also like...

  • price-parser - A small library for extracting price and currency from raw text strings.
  • number-parser - Library to convert numbers written in the natural language to it's equivalent numeric forms.
  • Scrapy - Web crawling and web scraping framework

License

BSD3-Clause

More Repositories

1

portia

Visual scraping for Scrapy
Python
8,991
star
2

splash

Lightweight, scriptable browser as a service with an HTTP API
Python
3,898
star
3

frontera

A scalable frontier for web crawlers
Python
1,274
star
4

slackbot

A chat bot for Slack (https://slack.com).
Python
1,256
star
5

scrapyrt

HTTP API for Scrapy spiders
Python
810
star
6

extruct

Extract embedded metadata from HTML markup
Python
801
star
7

python-crfsuite

A python binding for crfsuite
Python
766
star
8

spidermon

Scrapy Extension for monitoring spiders execution.
Python
510
star
9

price-parser

Extract price amount and currency symbol from a raw text string
Python
296
star
10

webstruct

NER toolkit for HTML data
HTML
252
star
11

article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts
Python
235
star
12

python-scrapinghub

A client interface for Scrapinghub's API
Python
195
star
13

adblockparser

Python parser for Adblock Plus filters
Python
187
star
14

js2xml

Convert Javascript code to an XML document
Python
185
star
15

testspiders

Useful test spiders for Scrapy
Python
183
star
16

scrapy-training

Scrapy Training companion code
Python
171
star
17

skinfer

Skinfer is a tool for inferring and merging JSON schemas
Python
139
star
18

sample-projects

Sample projects showcasing Scrapinghub tech
Python
136
star
19

shub

Scrapinghub Command Line Client
Python
126
star
20

python-simhash

An efficient simhash implementation for python
C
121
star
21

scrapy-poet

Page Object pattern for Scrapy
Python
111
star
22

mdr

A python library detect and extract listing data from HTML page.
C
106
star
23

number-parser

Parse numbers written in natural language
Python
103
star
24

web-poet

Web scraping Page Objects core library
Python
89
star
25

aile

Automatic Item List Extraction
HTML
88
star
26

wappalyzer-python

UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
Python
82
star
27

pydepta

A python implementation of DEPTA
C
80
star
28

scrapinghub-stack-scrapy

Software stack with latest Scrapy and updated deps
Dockerfile
62
star
29

scrapy-autounit

Automatic unit test generation for Scrapy.
Python
55
star
30

learn.scrapinghub.com

Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEB
CSS
55
star
31

scrapy-autoextract

Zyte Automatic Extraction integration for Scrapy
Python
54
star
32

aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).
C
53
star
33

portia2code

Python
49
star
34

arche

Analyze scraped data
Python
47
star
35

scmongo

MongoDB extensions for Scrapy
Python
44
star
36

exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
Python
40
star
37

page_clustering

A simple algorithm for clustering web pages, suitable for crawlers
HTML
35
star
38

webpager

Paginating the web
C
35
star
39

scrapy-frontera

More flexible and featured Frontera scheduler for Scrapy
Python
35
star
40

scaws

Extensions for using Scrapy on Amazon AWS
Python
32
star
41

flatson

Tool to flatten stream of JSON-like objects, configured via schema
Python
32
star
42

docker-images

Dockerfile
32
star
43

scrapylib

Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
Python
31
star
44

pycon-speakers

Speakers Spider (PyCon 2014 sprint)
Python
30
star
45

docker-devpi

pypi caching service using devpi and docker
Shell
28
star
46

crawlera-tools

Crawlera tools
Python
26
star
47

scrapinghub-entrypoint-scrapy

Scrapy entrypoint for Scrapinghub job runner
Python
25
star
48

scrapy-mosquitera

Restrict crawl and scraping scope using matchers.
Python
25
star
49

kafka-scanner

High Level Kafka Scanner
Python
19
star
50

andi

Library for annotation-based dependency injection
Python
19
star
51

autoextract-spiders

Pre-built Scrapy spiders for AutoExtract
Python
18
star
52

python-cld2

Python bindings for CLD2.
Python
17
star
53

python-hubstorage

Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 instead
Python
16
star
54

shublang

Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
Python
15
star
55

product-extraction-benchmark

Jupyter Notebook
15
star
56

shubc

Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloud
Go
13
star
57

shub-workflow

Python
12
star
58

scrapinghub-stack-portia

Software stack used to run Portia spiders in Scrapinghub cloud
Python
10
star
59

navscraper

Vanguard ETF NAV scraper
Python
8
star
60

tutorials

Python
7
star
61

pastebin

Python
7
star
62

hcf-backend

Crawl Frontier HCF backend
Python
7
star
63

varanus

A command line spider monitoring tool
Python
7
star
64

pydatanyc

Python
7
star
65

autoextract-poet

web-poet definitions for AutoExtract
Python
6
star
66

collection-scanner

HubStorage collection scanner library
Python
5
star
67

locode

Python
5
star
68

autoextract-examples

Jupyter Notebook
4
star
69

webstruct-demo

HTTP demo for https://github.com/scrapinghub/webstruct
Python
4
star
70

shub-image

Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 instead
Python
4
star
71

docker-cloudera-manager

Run Cloudera Manager in docker
Dockerfile
3
star
72

adblockgoparser

Golang parser for Adblock Plus filters
Go
3
star
73

hubstorage-frontera

Hubstorage crawl frontier backend for Frontera
Python
3
star
74

httpation

Erlang
3
star
75

xpathcsstutorial

[Work in progress] XPath & CSS for web scraping tutorial
Jupyter Notebook
3
star
76

custom-images-examples

Examples of custom images running on Scrapinghub platform
3
star
77

epmdless_dist

Erlang
2
star
78

egraylog

Erlang
2
star
79

scrapinghub-conda-recipes

Conda packages for scrapinghub channel
Shell
2
star
80

pydaybot

Demo bot for Python Day Uruguay 2011
Python
2
star
81

erl-iputils

Erlang
1
star
82

jupyterhub-stacks

A docker images for jhub cluster
Python
1
star
83

scrapinghub-stack-hworker

[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0
Python
1
star
84

cld2

Compact Language Detector 2
C++
1
star
85

pkg-opengrok

Ubuntu packaging for OpenGrok
Shell
1
star
86

crawlera.com

crawlera.com website
HTML
1
star
87

discourse-sso-google

Use Google as Single-Sign-On provider for Discourse
Python
1
star