• Stars
    star
    1,154
  • Rank 39,355 (Top 0.8 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Novel Coronavirus 2019 time series data on cases

COVID-19 dataset

Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

  • confirmed tested cases of Coronavirus infection
  • the number of people who have reportedly died while sick with Coronavirus
  • the number of people who have reportedly recovered from it

Data

Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.

You can view the data, its structure as well as download it in alternative formats (e.g. JSON) from the DataHub:

https://datahub.io/core/covid-19

Sources

The upstream dataset currently lists the following upstream data sources:

We will endeavour to provide more detail on how regularly and by which technical means the data is updated. Additional background is available in the CSSE blog, and in the Lancet paper (DOI), which includes this figure:

countries timeline

Preparation

This repository uses Pandas to process and normalize the data.

You first need to install the dependencies:

pip install -r scripts/requirements.txt

Then run the following scripts:

python scripts/process_worldwide.py
python scripts/process_us.py

Python 3.8 .github/workflows/actions.yml

License

This dataset is licensed under the Open Data Commons Public Domain and Dedication License.

The data comes from a variety public sources and was collated in the first instance via Johns Hopkins University on GitHub. We have used that data and processed it further. Given the public sources and factual nature we believe that there the data is public domain and are therefore releasing the results under the Public Domain Dedication and License. We are also, of course, explicitly licensing any contribution of ours under that license.

More Repositories

1

country-codes

Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others
Python
847
star
2

awesome-data

Curated list of quality open datasets
733
star
3

s-and-p-500-companies

List of companies in the S&P 500 together with associated financials
Makefile
478
star
4

geo-countries

Country polygons as GeoJSON in a datapackage
Dockerfile
435
star
5

edgar

Securities and Exchange Commission (SEC) EDGAR database which contains regulatory filings from publicly-traded US corporations.
HTML
324
star
6

airport-codes

List of Airport codes, locations and other information around the world
Python
302
star
7

s-and-p-500

S&P 500 index data (aka Standard and Poor's index of 500 major US stocks)
Python
268
star
8

world-cities

List of major cities of the world as a datapackage
Python
220
star
9

country-list

List of all countries in the world with their ISO 2 digit codes (ISO 3166-1) as CSV and JSON
144
star
10

currency-codes

ISO 4217 List of Currencies and Currency Codes
Shell
137
star
11

un-locode

United Nations Codes for Trade and Transport Locations (UN/LOCODE) and Country Codes
Python
136
star
12

harmonized-system

HS Code as a datapackage
Python
104
star
13

population

Population figures for countries, regions (e.g. Asia) and the world.
Python
94
star
14

oil-prices

Brent crude and WTI oil prices from US EIA
Python
88
star
15

language-codes

ISO Language Codes (639-1 and 639-2)
Shell
87
star
16

gdp

Country, regional and world GDP in current US Dollars ($)
Python
70
star
17

publicbodies

A database of public bodies such as government departments, ministries etc.
Less
61
star
18

core-datasets

DataHub.io awesome datasets - curated collections of high quality dataset organized by topic
JavaScript
57
star
19

s-and-p-500-companies-financials

List of companies in the S&P 500 (Standard and Poor's 500).
HTML
56
star
20

finance-vix

CBOE Volatility Index (VIX) time-series dataset including daily open, close, high and low.
Makefile
55
star
21

geoip2-ipv4

GeoIP2 - free IP geolocation database.
51
star
22

nasdaq-listings

Data package for Nasdaq listings
Python
44
star
23

football-datasets

Major Europe leagues data (England, Spain, Italy, Germany and France)
Python
40
star
24

gold-prices

Gold prices data package
Python
39
star
25

ppp

Purchasing power parity (PPP)
Python
36
star
26

imf-weo

IMF World Economic Outlook Database Data
Python
35
star
27

top-level-domain-names

The delegation details of top-level domains
34
star
28

five-thirty-eight-datasets

Over 100 datasets scraped from FiveThirtyEight
Python
34
star
29

ISO-Container-Codes

Coded list of ISO 6346 shipping containers, used in international trade and electronic shipping messages.
33
star
30

clinical-trials-us

Official US clinical trial outcomes from the FDA
JavaScript
28
star
31

investor-flow-of-funds-us

Monthly net new cash flow into various mutual fund investment classes (equities, bonds etc).
Python
27
star
32

media-types

List of MIME types, subtypes, and file name extensions.
27
star
33

emojis

Unicode Emoji as UTS #51 specification
Python
24
star
34

population-city

City population yearly timeseries for female and male, and for both sexes, collected by the United Nations Statistics Division and published by UNData.
23
star
35

commodity-prices

Monthly Prices of 53 commodities and 10 indexes from 1980 to 2016.
Python
22
star
36

house-prices-us

US House Price Indices (Case-Shiller)
Python
21
star
37

sea-level-rise

Global Mean Sea Level Rise
Python
20
star
38

inflation

Annual Inflation, GDP deflator and consumer prices
Python
19
star
39

nyse-other-listings

Data package for NYSE listings
Python
19
star
40

breast-cancer

Breast cancer occurrences.
Python
19
star
41

continent-codes

List of continents with two letter code
19
star
42

global-temp

Global Temperature Time Series
Python
19
star
43

natural-gas

Natural Gas Prices including Henry Hub
Python
19
star
44

geo-boundaries-world-110m

DEPRECATED - replaced by https://github.com/datasets/geo-countries (Map of the world's countries - vector data at 1:110m scale)
19
star
45

browser-stats

Web browser usage statistics
Python
18
star
46

exchange-rates

Foreign exchange rates from US Federal Reserve.
Python
18
star
47

corruption-perceptions-index

Corruption Perceptions Index - CPI
R
17
star
48

co2-ppm

CO2 PPM - Trends in Atmospheric Carbon Dioxide
Shell
16
star
49

bond-yields-us-10y

10 year nominal yields on US government bonds from the Federal Reserve
Python
16
star
50

cpi

Annual consumer price index datapackage for most countries in the world
Python
15
star
51

crime-uk

UK Crime data
JavaScript
13
star
52

eu-emissions-trading-system

Data about the EU emission trading system (ETS)
Python
13
star
53

employment-us

US Employment and Unemployment rates since 1940 from Bureau of Labor Statistics
Python
13
star
54

world-religion-projections

Word Religion Projections (2010-2050)
12
star
55

cpi-us

Us Consumer Price Index (DataHub Data Package)
Python
12
star
56

co2-fossil-by-nation

Annual info about co2 emissions per nation
Python
12
star
57

IMO-IMDG-Codes

Official IMDG Codes for use in transport of dangerous goods as described by the IMO
11
star
58

gini-index

Repository of the GINI index official repository.
Python
11
star
59

house-prices-uk

UK house prices dataset
Python
11
star
60

co2-ppm-daily

Carbon Dioxide levels in the atmosphere (ppm on a daily basis)
Python
11
star
61

unece-units-of-measure

Standardised codes from Recommendation 20, mantained by UNECE.
Java
10
star
62

glwd

Global Lakes and Wetlands Database Levels 1 and 2 Polygons as GeoJSON (.geojson/.topojson) with original format (.shp)
10
star
63

openml-datasets

Group of most downloaded datasets extracted from https://www.openml.org
9
star
64

dac-and-crs-code-lists

Machine readable DAC CRS codelists
Python
9
star
65

geo-nuts-administrative-boundaries

Datapackage for NUTS admin levels 1, 2 and 3 edition 2010
JavaScript
9
star
66

glacier-mass-balance

Average cumulative mass balance of "reference" Glaciers worldwide
Python
8
star
67

fips-10-4

List of FIPS (Federal Information Processing Standards) region codes
Python
8
star
68

euribor

Euribor rates by year and granularity.
Python
8
star
69

gdp-us

Gross Domestic Product of the United States (US GDP)
Python
8
star
70

opented

Tenders Electronic Daily (TED) - OpenTED
JavaScript
8
star
71

cpi-gb

Consumer Price Index (and hence inflation) for the UK from 1850 to the present (monthly since June 1947).
JavaScript
7
star
72

cofog

Classifications of Functions of Government
Python
7
star
73

geo-boundaries-us-110m

Internal, first-order administrative boundaries and polygons for the United States in .shp, .geojson, and .topojson.
7
star
74

datacatalogs.org

Data from DataCatalogs.org
Python
7
star
75

exchange-rates-usd

Exchange Rates Data Package
Python
6
star
76

land-matrix

land-matrix
6
star
77

speed-dating

Data was gathered from participants in experimental speed dating events from 2002-2004
Python
6
star
78

population-growth-estimates-and-projections

Total Population
Python
6
star
79

unece-package-codes

Coded representations of the package type names used in International Trade (UNECE/CEFACT Trade Facilitation Recommendation No.21)
6
star
80

co2-fossil-global

Global CO2 Emissions from fossil-fuels annually since 1751 till 2014.
6
star
81

lme-large-marine-ecosystems

LME (Large Marine Ecosystems) global dataset; originally .kml (.kmz), and .shp formats, converted to .geojson/.topojson
6
star
82

world-wealth-and-income-database

World Wealth and Income Database (formerly World Top Incomes Database). Database of income shares of top end of population for long time periods (e.g. 1875-present) for a variety of countries around the world.
6
star
83

population-global-historical

Global historical population data
Python
6
star
84

smdg-master-terminal-facilities-list

List mantained by the SMDG Secretariat to specify the port terminal facilities in UN/EDIFACT messages.
5
star
85

pharmaceutical-drug-spending

Pharmaceutical Drug Spending by countries
Python
5
star
86

bond-yields-gov-long-term

Long term government bond yields
5
star
87

zopa

Data on interest rate and risk (default rates) at ZOPA, the peer-to-peer marketplace for money.
Python
5
star
88

ICC-Incoterms

International Commercial Terms (‘Incoterms’) are internationally recognised standard trade terms used in sales contracts.
5
star
89

global-temp-anomalies

Data about global annual anomalies
Python
5
star
90

house-prices-global

Residential property price statistics from different countries (from bis.org)
Python
4
star
91

geo-admin1-us

Natural Earth admin1 in the USA
4
star
92

eeg-eye-state

EEG measurements where the output is whether eye was open or not
Python
4
star
93

dermatology

Patients with dermatology illnesses.
Python
4
star
94

cervical-cancer

Cervical cancer occurrences
Python
4
star
95

cash-surplus-deficit

Cash Surplus/Deficit (% of GDP), from 1990 to 2013
Python
4
star
96

geo-ne-admin1

Test of a datapackage for Natural Earth admin1
3
star
97

primary-tumor

Primary tumors in people
Python
3
star
98

gdp-uk

UK GDP
Shell
3
star
99

genome-sequencing-costs

Costs assosiated with DNA sequencing since 2001
Python
3
star
100

london-crime

Python
3
star