• Stars
    star
    165
  • Rank 228,906 (Top 5 %)
  • Language
    HTML
  • License
    The Unlicense
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Coronavirus (COVID-19) UK Historical Data

COVID-19 UK Historical Data

โš ๏ธ Update: 1 August 2020. This repository is deprecated and is no longer updated. Users are encouraged to move to official upstream data sources which are listed below โš ๏ธ

Data on numbers of tests, confirmed cases, and deaths for coronavirus (COVID-19) in the UK is published by the government, but it is fragmented and not always provided in consistent or machine-friendly formats. Also, in many cases only the latest numbers are available so it's not possible to look at changes over time.

This site collates the historical data and provides it in an easily consumable format (CSV), in both wide and tidy data forms.

Ideally the data publishers will start doing this so this site becomes redundant.

Data files

The following CSV files are available (note they are no longer updated):

  • data/covid-19-cases-uk.csv: daily counts of confirmed cases for (upper tier) local authorities in England, health boards in Scotland and Wales, and local government district for Northern Ireland.
    • Note that prior to 18 March 2020 Wales data was broken down by local authority, not heath board, and prior to 27 March 2020 there were no breakdowns by area for Northern Ireland.
  • data/covid-19-totals-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK
  • data/covid-19-totals-england.csv: daily counts of tests, confirmed cases, deaths for England
  • data/covid-19-totals-northern-ireland.csv: daily counts of tests, confirmed cases, deaths for Northern Ireland
  • data/covid-19-totals-scotland.csv: daily counts of tests, confirmed cases, deaths for Scotland
  • data/covid-19-totals-wales.csv: daily counts of tests, confirmed cases, deaths for Wales
  • data/covid-19-indicators-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK and individual countries in the UK (England, Scotland, Wales, Northern Ireland). This is a tidy-data version of covid-19-totals-*.csv combined into one file.
  • data/daily/*.csv: daily counts, with a separate file for each date and country.

Interpreting the numbers (more information on this DHSC/PHE page, and the PHE dashboard about page)

  • "Tests" are the number of people tested, not the number of samples tested.
  • "Confirmed cases" are the number of people with a positive test.
  • "Deaths" are hospital deaths, so they don't include deaths of people with COVID-19 who died at home for example. (Although this changed in England on 29 April 2020.)

Note that the totals for the UK don't necessarily equal the sum of the totals of the four nations (England, Scotland, Wales, Northern Ireland), due to differences in date reported.

You can use these files without reading the rest of this document.

There is an experimental Datasette instance hosting the data. This is useful for running simple SQL on the data, or exporting in JSON format.

News

  • 1 August 2020. Retired this repo. See discussion here.
  • 2 July 2020. PHE started including Pillar 2 data in England confirmed case numbers. This data is now being included in this repository.
  • 1 July 2020. England UTLA confirmed case data is no longer being included since it doesn't have Pillar 2 tests, which make up the vast majority of tests.
  • 1 July 2020. NI data is no longer being included since the (undocumented) backend API changed again, and the NI Department of Health does not provide a machine-readable alternative. (See 2 June 2020 entry below.)
  • 30 June 2020. With the new Leicester lockdown, media attention around the lack of Pillar 2 data in England has increased. I have added a prominent warning to the top of this README.
  • 2 June 2020. I received a reply from the NI Department of Health to my enquiry about making machine readable downloads available. For this reason I may stop collating NI data in this repository, since the JSON API the code uses is undocumented and changes from time-to-time. See #63.

Mr White

Thank you for your query. Currently, the information on which the dashboard statistics are based is being drawn from live systems and the data is continually being revised. This means that we do not at this time feel it would be appropriate to provide data that is still volatile and is subject to both revision and change.

Regards

Information and Analysis Directorate

  • 28 May 2020. DHSC is now providing a timeseries of testing data, linked to from this DHSC/PHE page.
  • 23 May 2020. DHSC is no longer reporting the number of people tested (daily or cumulative) in Pillar 2, hence it is not possible to give an overall total.
  • 12 May 2020. The PHW dashboard data download link is no longer static - it changes every day, and there is no easy way to retrieve it, since it is dynamically generated in Tableau.
  • 1 May 2020. The NI Department of Health dashboard has been re-instated.
  • 28 April 2020. The NI Department of Health is no longer reporting the number of people tested, just the number of tests.
  • 21 April 2020. The PHA NI dashboard was suspended since it was reporting incorrect data. Test and total confirmed case numbers are being announced on Twitter by @healthdpt. Area breakdowns are no longer being provided.
  • 21 April 2020. The PHW dashboard now has a link to download the data in XLSX format. The URL is dynamically generated however, so it's still not easy to automate the download.
  • 20 April 2020. The PHE dashboard now has stable URLs for its CSV downloads.
  • 18 April 2020. PHA NI launched a dashboard to replace the daily surveillance reports.
  • 15 April 2020. A new dashboard for UK and England was launched, replacing the ArcGIS one. As a part of this change the XLSX/CSV files for daily indicators, and case counts by region and UTLA (in England) are no longer being produced. They have been replaced by CSV files, or - for programmatic access - a JSON feed.
  • 14 April 2020. No per-area case numbers produced for NI, even though it is a weekday (Tuesday). Yesterday was a bank holiday, and no case numbers were produced either.
  • 9 April 2020. The reporting period for case numbers in Wales changed. "For operational reasons, we are moving the point at which we count new cases of Novel Coronavirus (Covid-19) back from 7pm to 1pm. Case numbers on Thursday [9 April] will therefore be lower than usual, and will return to normal on Friday [10 April]."
  • 8 April 2020. Scotland started publishing numbers for people in hospital and intensive care, by health board. They also started reporting numbers that were less 5 as "*".
  • 6 April 2020. Wales published a new interactive dashboard, which gives data for confirmed cases, and testing episodes, broken down by local authority and health board. There is historical data too. Unfortunately there is currently no way of exporting the raw data from the dashboard.
  • 2 April 2020. Scotland reported a more timely process for counting deaths.
  • 29 March 2020. There's a new spreadsheet that includes historical data for the dashboard. This includes cases (by country, English UTLA, English NHS region), deaths (by country), and recovered patients (although this isn't being updated at the time of writing).
  • 27 March 2020. UK daily indicators now include number of deaths for UK, England, Scotland, Wales, and Northern Ireland.
  • 26 March 2020. Northern Ireland's Public Health Agency (PHA) started publishing confirmed cases by Local Government District (LGD) on weekdays.
  • 25 March 2020. The reporting period for number of deaths changed. Previously it was for the 24 hour period starting and ending at 9am. The new period starts and ends at 5pm, and is reported the following afternoon at 2pm. (So the number of deaths reported on 25 March (cumulative total 463) represents the period 9am to 5pm on 24 March.) The testing and case numbers continue to be the 9am period.
  • 24 March 2020. Northern Ireland's Public Health Agency (PHA) started producing a Daily COVID-19 Surveillance Bulletin in PDF form. It contains test numbers (also broken down by Health and Social Care Trust), and case numbers but only on a choropleth map (and broken down by age and gender).
  • 21 March 2020. PHW is back to health board (not LA) breakdowns again, this time it looks permanent.
  • 20 March 2020. PHW is providing LA area breakdowns again, after not doing so for two days.
  • 18 March 2020. PHW is no longer providing LA area breakdowns. "Novel Coronavirus (COVID-19) is now circulating in every part of Wales. For this reason, we will not be reporting cases by local authority area from today. From tomorrow, we will update daily at 12 noon the case numbers by health board of residence."

Data sources

The following sources may include more data than described here. This summary includes only Tests, Confirmed cases and Deaths.

UK

  • Source: UK testing time series (CSV)
    • Tests: number of people tested (Pillar 1 only) by day in UK, England, Scotland, NI; (Pillar 1 and 2) Wales
    • Confirmed cases: number of confirmed cases (Pillar 1 and 2) by day in UK, England, Scotland, Wales, NI
  • Source: UK daily deaths time series (CSV)
    • Deaths: number of deaths by day in UK
  • Source: UK dashboard deaths (CSV) (JSON)
    • Deaths: number of deaths by day in UK, England, Scotland, Wales, NI
  • Charts available on the PHE dashboard
  • Twitter updates: @DHSCgovuk

England

  • Source: UK dashboard cases (CSV) (JSON)
    • Confirmed cases: number of confirmed cases (Pillar 1 and 2) by day in England, regions, UTLAs, LTLAs
  • Charts available on the PHE dashboard
  • Twitter updates: @PHE_uk

Scotland

  • Source: Trends in daily COVID-19 data (XLSX) (CSVs)
    • Tests: number of people tested (Pillar 1, and Pillar 2 since 15 June) by day in Scotland (CSV)
    • Confirmed cases: number of confirmed cases (Pillar 1, and Pillar 2 since 15 June) by day in Scotland (CSV)
    • Deaths: number of deaths by day in Scotland (CSV)
  • Source: COVID-19 data by NHS Board (XLSX) (CSV)
    • Confirmed cases: number of confirmed cases (Pillar 1, and Pillar 2 since 15 June) by day by health board
  • See also statistics.gov.scot
  • Charts available on the PHS dashboard
  • Twitter updates: @scotgov

Wales

  • Source: Data download (XLSX)
    • Tests: number of people tested (Pillars 1 and 2) by day by local authority
    • Confirmed cases: number of confirmed cases (Pillars 1 and 2) by day by local authority
    • Deaths: number of deaths by day in Wales; number of cumulative deaths by health board
  • More information and charts available on the PHW dashboard
  • Twitter updates: @PublicHealthW

Northern Ireland

  • Source: No machine-readable dataset available
  • Charts available on the Department of Health dashboard
    • Includes number of people tested and confirmed cases for Pillar 1, and Pillar 2 since 24 June.
  • Twitter updates: @healthdpt

Local Authority and Health Board metadata

Related projects/datasets

Wishlist

Here are my suggestions for how to improve the data being published by public bodies.

The short version: publish everything in CSV format, and include historical data!

  • Public Health Agency, Northern Ireland: Provide a machine readable version of the historical data on the dashboard.

The reporting systems have changed a lot since the outbreak began, and overall they have improved, both in the amount of information being published, and the ease of access of machine-readable datasets. (Public Health Scotland provides all their data in XLSX and CSV format, including historical data. Public Health Wales provides a XLSX spreadsheet with historical data.)

Tools

There are command line tools for downloading, parsing, and processing the data. They rely on Python 3.

To install the tools, create a virtual environment, activate it, then install the required packages:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Daily workflow

A sqlite DB is now used to store and aggregate intermediate data. The CSV files remain the point of record.

The crawl tool will see if the reseouce (webpage, date file) has already been downloaded, and if it hasn't download it if it's available for the specified date (today). (If not available the tool will exit.) If available, the tool will then extract the relevant information from it and update the sqlite database. This means that you can just run crawl until it finds new updates.

The convert_sqlite_to_csvs tool will extract the data from sqlite and update the CSV files.

The updates tool runs crawl then convert_sqlite_to_csvs, and issues interactive prompts for if you want to commit the changes to git.

There is also a crawl_all tool (and corresponding update_all tool) that uses machine-readable sources to update all historical data for that source. This is not available for all sources yet.

./tools/update_all.sh phw
./tools/update_all.sh phs
./tools/update.sh NI
./tools/update.sh UK
./tools/update_all.sh phe

The equivalent done manually (just for Wales):

DATE=$(date +'%Y-%m-%d')
./tools/crawl.py $DATE Wales
./tools/convert_sqlite_to_csvs.py
git add data/; git commit -am "Update for $DATE for Wales"

NI updates are being done manually since there are currently no machine-readable sources.

# edit covid-19-totals-northern-ireland.csv and add tests/cases/deaths
./tools/convert_totals_to_indicators.py
csvs-to-sqlite --replace-tables -t indicators -pk Date -pk Country -pk Indicator data/covid-19-indicators-uk.csv data/covid-19-uk.db
./tools/convert_sqlite_to_csvs.py
git commit -a # "Update for xxx for NI from https://twitter.com/healthdpt"

Updates are not always made at a consistent time of day, so the following command can be run continuously in a terminal to check for updates every 10 minutes. The -b option makes it beep if there is a new update.

watch -n 600 -b ./tools/crawl.py

Check data consistency

./tools/check_indicators.py
./tools/check_totals.py

Manual overrides

Sometimes it's necessary to fix data by hand. In this case the following tools are useful:

Repopulate the sqlite database from the CSV files:

rm data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t indicators -pk Date -pk Country -pk Indicator data/covid-19-indicators-uk.csv data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t cases -pk Date -pk Country -pk AreaCode -pk Area data/covid-19-cases-uk.csv data/covid-19-uk.db

More Repositories

1

hadoop-book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
Makefile
3,467
star
2

cubed

Bounded-memory serverless distributed N-dimensional array processing
Python
65
star
3

set-game

Play SET using image recognition and deep learning
Java
23
star
4

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem
Shell
20
star
5

whirr

A set of libraries for running cloud services.
Java
15
star
6

docker-impala

Run Impala in a Docker container.
Shell
14
star
7

whirr-cm

Java
11
star
8

jdiff

JDiff branch to add an -incompatible flag
Java
9
star
9

hadoop-book-mr-dev

Java
8
star
10

java-magazine-kite

Java
7
star
11

ingreedy-js

Ingredient parsing in Javascript
JavaScript
7
star
12

genomics-analytics

Java
7
star
13

t1d-genetic-risk-score

Code for calculating an individual's T1D genetic risk score from their 23andMe data
Python
7
star
14

dask-executor-scheduler

A Dask scheduler that uses a Python concurrent.futures.Executor to run tasks
Python
6
star
15

hadoop-drdobbs

Java
6
star
16

whirr-service-example

How to write a Whirr service
Java
5
star
17

riemann-staircase

Notebooks for computing approximations to the prime counting function using Riemann's formula.
Jupyter Notebook
4
star
18

articles

Java
4
star
19

inaturalist-datashader-map

A zoomable map of iNaturalist observations made using Datashader and Leaflet
Python
4
star
20

donors-choose-hack

A few scripts for "Hacking Education: A Contest for Developers and Data Crunchers", run by DonorsChoose.org.
Python
3
star
21

mandelbrot-1989

My programs to draw the Mandelbrot Set from 1989 using GFA Basic on the Atari ST
3
star
22

dboard

A dashboard for Type 1 Diabetes.
Python
3
star
23

isitdayornight

HTML 5 geolocation code that works out if it's day or night
JavaScript
3
star
24

facebender

A project from A. K. Dewdney's 'The Armchair Universe' for producing caricatures.
Java
3
star
25

superellipse

A program to generate Piet Hein's superellipses.
Java
3
star
26

book-tools

Perl
3
star
27

hyperbolic

A Java implementation of a draggable hyperbolic display.
Java
3
star
28

disq-original

A library for manipulating bioinformatics sequencing formats in Apache Spark.
Java
3
star
29

benford

Python
3
star
30

core

Java
2
star
31

serialavro2parquet

Java
2
star
32

l-systems

Visualizing L-Systems
JavaScript
2
star
33

gaussian-primes

Visualization of Gaussian primes
PostScript
2
star
34

tournesol

Phyllotaxis code
Python
2
star
35

alhambra

A Java API for experimenting with tilings and patterns.
Java
2
star
36

spelling-bee

Automated spelling bee
JavaScript
2
star
37

8-bit-computer

Ben Eater's 8-bit computer in CircuitJS1
Python
2
star
38

adam-partitioning

2
star
39

datavision-code

Code for Datavision 2020
HTML
2
star
40

tomwhite.github.com

Website
HTML
2
star
41

atomiclisp

Atomic LISP is my reformulation of Gregory J. Chaitin's version of LISP that he uses in his books "The Unknowable" (Springer Verlag, 1999) and "The Limits of Mathematics" (Springer Verlag 1998).
Java
2
star
42

earth-moon-game

A game: how far is the Moon from the Earth?
HTML
1
star
43

bin-scripts

Shell
1
star
44

binarypi

Calculating the nth binary digit of pi
Java
1
star
45

blockclock

A real digital clock
HTML
1
star
46

hadoopbook.com

Source for hadoopbook.com
1
star
47

sgkit-vcf-old

Jupyter Notebook
1
star
48

single-cell-spark-demo

Experiments on Single Cell data from 10x Genomics using Apache Spark.
Python
1
star
49

ga

A simple Java package for writing genetic algorithms (GAs).
Java
1
star
50

inflector

Inflector provides an API for forming the plurals of nouns.
Java
1
star
51

zxtext2p

C
1
star
52

hadoop-compatibility-tools

Shell
1
star
53

kite-example

Java
1
star
54

gatk-spark-workflows

1
star
55

how-far-away-is-the-sea

An app for calculating the shortest distance to the sea from any point on Earth.
JavaScript
1
star
56

chernoff

A visual mood indicator. One of the first Java programs I ever wrote.
Java
1
star
57

hadoop-annotation-tools

Java
1
star
58

ffawyddog.com

Nunjucks
1
star
59

sigtest-annotations-plugin

Java
1
star
60

longview

A way of looking at time from a long distance.
Java
1
star
61

prisonersdilemma

Axelrod's Iterated Prisoners' Dilemma contests
Java
1
star
62

chickenalerts

An Android app that reminds you to shut your chickens up for the night 45 minutes after sunset
Java
1
star
63

apportionment

Calculating apportionments for the House of Representatives using the Huntington-Hill method.
Python
1
star
64

periodic-table

The Periodic Table, Animated
1
star
65

hdf5-java-cloud

A small, experimental library for reading HDF5 files in parallel from the Cloud using Java and Spark.
Java
1
star