• This repository has been archived on 24/Oct/2022
  • Stars
    star
    470
  • Rank 89,915 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world.

Official Site

Please refer to the official site for this repository for visualizations and other relevant information: https://health.google.com/covid-19/open-data/

Repository No Longer Updated

As of September 15, 2022, we will be turning off real-time updates in this repository, and converting the repository to a retrospective one. The data will continue to be available without interruption for the foreseeable future at the existing location, but it will not be updated further. Users who wish to continue to receive updates are encouraged to inspect our data sources, or clone the code and run the data pipelines locally.

COVID-19 Open-Data

This repository attempts to assemble the largest Covid-19 epidemiological database in addition to a powerful set of expansive covariates. It includes open, publicly sourced, licensed data relating to demographics, economy, epidemiology, geography, health, hospitalizations, mobility, government response, weather, and more. Moreover, the data merges daily time-series, +20,000 global sources, at a fine spatial resolution, using a consistent set of region keys. All regions are assigned a unique location key, which resolves discrepancies between ISO / NUTS / FIPS codes, etc. The different aggregation levels are: The different aggregation levels are:

  • 0: Country
  • 1: Province, state, or local equivalent
  • 2: Municipality, county, or local equivalent
  • 3: Locality which may not follow strict hierarchical order, such as "city" or "nursing homes in X location"

There are multiple types of data:

  • Outcome data Y(i,t), such as cases, tests, hospitalizations, deaths and recoveries, for region i and time t
  • Static covariate data X(i), such as population size, health statistics, economic indicators, geographic boundaries
  • Dynamic covariate data X(i,t), such as mobility, search trends, weather, and government interventions

The data is drawn from multiple sources, as listed below, and stored in separate tables as CSV files grouped by context, which can be easily merged due to the use of consistent geographic (and temporal) keys as it is done for the aggregated table.

Table Keys1 Content URL Source2
Aggregated [key][date] Flat, compressed table with records from (almost) all other tables joined by date and/or key; see below for more details aggregated.csv All tables below
Index [key] Various names and codes, useful for joining with other datasets index.csv, index.json Wikidata, DataCommons, Eurostat
Demographics [key] Various (current3) population statistics demographics.csv, demographics.json Wikidata, DataCommons, WorldBank, WorldPop, Eurostat
Economy [key] Various (current3) economic indicators economy.csv, economy.json Wikidata, DataCommons, Eurostat
Epidemiology [key][date] COVID-19 cases, deaths, recoveries and tests epidemiology.csv, epidemiology.json Various2
Emergency Declarations [key][date] Government emergency declarations and mitigation policies lawatlas-emergency-declarations.csv LawAtlas Project
Geography [key] Geographical information about the region geography.csv, geography.json Wikidata
Health [key] Health indicators for the region health.csv, health.json Wikidata, WorldBank, Eurostat
Hospitalizations [key][date] Information related to patients of COVID-19 and hospitals hospitalizations.csv, hospitalizations.json Various2
Mobility [key][date] Various metrics related to the movement of people.

To download or use the data, you must agree to the Google Terms of Service.
mobility.csv, mobility.json Google
Search Trends [key][date] Trends in symptom search volumes due to COVID-19.

To download or use the data, you must agree to the Google Terms of Service.
google-search-trends.csv Google
Vaccination Access [place_id] Metrics quantifying access to COVID-19 vaccination sites.

To download or use the data, you must agree to the Google Terms of Service.
facility-boundary-us-all.csv Google
Vaccination Search [key][date] Trends in Google searches for COVID-19 vaccination information.

To download or use the data, you must agree to the Google Terms of Service.
Global-vaccination-search-insights.csv Google
Vaccinations [key][date] Trends in persons vaccinated and population vaccination rate regarding various Covid-19 vaccines.

vaccinations.csv Google
Government Response [key][date] Government interventions and their relative stringency oxford-government-response.csv, oxford-government-response.json University of Oxford
Weather [key][date] Dated meteorological information for each region weather.csv NOAA
WorldBank [key] Latest record for each indicator from WorldBank for all reporting countries worldbank.csv, worldbank.json WorldBank
By Age [key][date] Epidemiology and hospitalizations data stratified by age by-age.csv, by-age.json Various2
By Sex [key][date] Epidemiology and hospitalizations data stratified by sex by-sex.csv, by-sex.json Various2

1 key is a unique string for the specific geographical region built from a combination of codes such as ISO 3166, NUTS, FIPS and other local equivalents.
2 Refer to the data sources for specifics about each data source and the associated terms of use.
3 Datasets without a date column contain the most recently reported information for each datapoint to date.

For more information about how to use these files see the section about using the data, and for more details about each dataset see the section about understanding the data.

Why another dataset?

There are many other public COVID-19 datasets. However, we believe this dataset is unique in the way that it merges multiple global sources, at a fine spatial resolution, using a consistent set of region keys in a way we hope facilitate ease of usage. Most importantly, we are committed to transparency regarding open, public, and licensed data sources. Lastly, the code for ingesting and merging the data is easy to understand and modify.

Explore the data

A simple visualization tool was built to explore the Open COVID-19 datasets, the Open COVID-19 Explorer: drawing
A variety of other community contributed visualization tools are listed below.

See the COVID19 Data Block made by the Looker team: If you want to see interactive charts with a unique UX, don't miss what @Mahks built using the Open COVID-19 dataset: You can also check out the great work of @quixote79, a MapBox-powered interactive map site:
Experience clean, clear graphs with smooth animations thanks to the work of @jmullo: Become an armchair epidemiologist with the COVID-19 timeline simulation tool built by @LeviticusMB: Whether you want an interactive map, compare stats or look at charts, @saadmas has you covered with a COVID-19 Daily Tracking site:
Compare per-million data at Omnimodel thanks to @OmarJay1: Look at responsive, comprehensive charts thanks to the work of @davidjohnstone: Reproduction Live lets you track COVID-19 outbreaks in your region and visualise the spread of the virus over time:

Use the data

The data is available as CSV and JSON files, which are published in Google Cloud Storage so they can be served directly to Javascript applications without the need of a proxy to set the correct headers for CORS and content type.

For the purpose of making the data as easy to use as possible, there is an aggregated table which contains the columns of all other tables joined by key and date. However, performance-wise, it may be better to download the data separately and join the tables locally.

Each region has its own version of the aggregated table, so you can pull all the data for a specific region using a single endpoint, the URL for each region is:

  • Data for key in CSV format: https://storage.googleapis.com/covid19-open-data/v3/location/${key}.csv
  • Data for key in JSON format: https://storage.googleapis.com/covid19-open-data/v3/location/${key}.json

Each table has a full version as well as subsets with only the last day of data. The full version is accessible at the URL described in the table above. The subsets can be found by inserting latest into the path. For example, the subsets of the epidemiology table are available at the following locations:

Please note that the aggregated table is not compressed for the latest subset, so the URL is https://storage.googleapis.com/covid19-open-data/v3/latest/aggregated.csv.

Note that the latest version contains the last non-null record for each key. All of the above listed tables have a corresponding JSON version; simply replace csv with json in the link.

If you are trying to use this data alongside your own datasets, then you can use the Index table to get access to the ISO 3166 / NUTS / FIPS code, although administrative subdivisions are not consistent among all reporting regions. For example, for the intra-country reporting, some EU countries use NUTS2, others NUTS3 and many ISO 3166-2 codes.

You can find several examples in the examples subfolder with code showcasing how to load and analyze the data for several programming environments. If you want the short version, here are a few snippets to get started.

BigQuery

This dataset is part of the BigQuery Public Datasets Program, so you may use BigQuery to run SQL queries directly from the online query editor free of charge.

Google Colab

You can use Google Colab if you want to run your analysis without having to install anything in your computer, simply go to this URL: https://colab.research.google.com/github/GoogleCloudPlatform/covid-19-open-data.

Google Sheets

You can import the data directly into Google Sheets, as long as you stay within the size limits. For instance, the following formula loads the latest epidemiology data into the current sheet:

=IMPORTDATA("https://storage.googleapis.com/covid19-open-data/v3/latest/epidemiology.csv")

Note that Google Sheets has a size limitation, so only data from the latest subfolder can be imported automatically. To work around that, simply download the file and import it via the File menu.

R

If you prefer R, then this is all you need to do to load the epidemiology data:

data <- read.csv("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv")

Python

In Python, you need to have the package pandas installed to get started:

import pandas
data = pandas.read_csv("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.csv")

jQuery

Loading the JSON file using jQuery can be done directly from the output folder, this code snippet loads the epidemiology table into the data variable:

$.getJSON("https://storage.googleapis.com/covid19-open-data/v3/epidemiology.json", data => { ... }

Powershell

You can also use Powershell to get the latest data for a country directly from the command line, for example to query the latest epidemiology data for Australia:

Invoke-WebRequest 'https://storage.googleapis.com/covid19-open-data/v3/latest/epidemiology.csv' | ConvertFrom-Csv | `
    where key -eq 'AU' | select date,cumulative_confirmed,cumulative_deceased,cumulative_recovered

Understand the data

Make sure that you are using the URL linked at the table above and not the raw GitHub file, the latter is subject to change at any moment in non-compatible ways, and due to the configuration of GitHub's raw file server you may run into potential caching issues.

Missing values will be represented as nulls, whereas zeroes are used when a true value of zero is reported.

For information about each table, see the corresponding documentation linked above.

Aggregated table

Flat table with records from all other tables joined by key and date. See above for links to the documentation for each individual table. Due to technical limitations, not all tables can be included as part of this aggregated table.

Notes about the data

For countries where both country-level and subregion-level data is available, the entry which has a null value for the subregion level columns in the index table indicates upper-level aggregation. For example, if a data point has values {country_code: US, subregion1_code: CA, subregion2_code: null, ...} then that record will have data aggregated at the subregion1 (i.e. state/province) level. If subregion1_codewere null, then it would be data aggregated at the country level.

Another way to tell the level of aggregation is the aggregation_level of the index table, see the schema documentation for more details about how to interpret it.

Please note that, sometimes, the country-level data and the region-level data come from different sources so adding up all region-level values may not equal exactly to the reported country-level value. See the data loading tutorial for more information.

Data updates

The data for each table is updated at least daily. Individual tables, for example Epidemiology, have fresher data than the aggregated table and are updated multiple times a day. Each individual data source has its own update schedule and some are not updated in a regular interval; the data tables hosted here only reflect the latest data published by the sources.

Contribute

Technical contributions to the data extraction pipeline are welcomed, take a look at the source directory for more information.

If you spot an error in the data, feel free to open an issue on this repository and we will review it.

If you do something with this data, for example a research paper or work related to visualization or analysis, please let us know!

For Data Owners

We have carefully checked the license and attribution information on each data source included in this repository, and in many cases have contacted the data owners directly to ask how they would like to be attributed.

If you are the owner of a data source included here and would like us to remove data, add or alter an attribution, or add or alter license information, please open an issue on this repository and we will happily consider your request.

Licensing

The output data files are published under the CC BY license. All data is subject to the terms of agreement individual to each data source, refer to the sources of data table for more details. All other code and assets are published under the Apache License 2.0.

Sources of data

All data in this repository is retrieved automatically. When possible, data is retrieved directly from the relevant authorities, like a country's ministry of health. For a list of individual data sources, please see the documentation for the individual tables linked at the top of this page.

Running the data extraction pipeline

See the source documentation for more technical details.

Acknowledgments and collaborations

This project has been done in collaboration with FinMango, which provided great insights about the impact of the pandemic on the local economies and also helped with research and manual curation of data sources for many regions including South Africa and US states.

Stratified mortality data for US states is provided by Imperial College of London. Please refer to this list of maintainers and contributors for the individual acknowledgements.

The following persons have made significant contributions to this project:

  • Oscar Wahltinez
  • Kevin Murphy
  • Michael Brenner
  • Matt Lee
  • Anthony Erlinger
  • Mayank Daswani
  • Pranali Yawalkar
  • Zack Ontiveros
  • Ruth Alcantara
  • Donny Cheung
  • Aurora Cheung
  • Chandan Nath
  • Paula Le
  • Ofir Picazo Navarro

Recommended citation

Please use the following when citing this project as a source of data:

@article{Wahltinez2020,
  author = "O. Wahltinez and others",
  year = 2020,
  title = "COVID-19 Open-Data: curating a fine-grained, global-scale data repository for SARS-CoV-2",
  note = "Work in progress",
  url = {https://goo.gle/covid-19-open-data},
}

More Repositories

1

microservices-demo

Sample cloud-first application with 10 microservices showcasing Kubernetes, Istio, and gRPC.
Go
15,783
star
2

terraformer

CLI tool to generate terraform files from existing infrastructure (reverse Terraform). Infrastructure to Code
Go
11,610
star
3

training-data-analyst

Labs and demos for courses for GCP Training (http://cloud.google.com/training).
Jupyter Notebook
7,479
star
4

python-docs-samples

Code samples used on cloud.google.com
Jupyter Notebook
6,985
star
5

generative-ai

Sample code and notebooks for Generative AI on Google Cloud
Jupyter Notebook
5,282
star
6

golang-samples

Sample apps and code written for Google Cloud in the Go programming language.
Go
4,136
star
7

nodejs-docs-samples

Node.js samples for Google Cloud Platform products.
JavaScript
2,762
star
8

tensorflow-without-a-phd

A crash course in six episodes for software developers who want to become machine learning practitioners.
Jupyter Notebook
2,735
star
9

professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
Python
2,730
star
10

gcsfuse

A user-space file system for interacting with Google Cloud Storage
Go
1,977
star
11

community

Java
1,908
star
12

PerfKitBenchmarker

PerfKit Benchmarker (PKB) contains a set of benchmarks to measure and compare cloud offerings. The benchmarks use default settings to reflect what most users will see. PerfKit Benchmarker is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.
Python
1,855
star
13

java-docs-samples

Java and Kotlin Code samples used on cloud.google.com
Java
1,610
star
14

ml-design-patterns

Source code accompanying O'Reilly book: Machine Learning Design Patterns
Jupyter Notebook
1,600
star
15

continuous-deployment-on-kubernetes

Get up and running with Jenkins on Google Kubernetes Engine
Shell
1,582
star
16

cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
Python
1,507
star
17

asl-ml-immersion

This repos contains notebooks for the Advanced Solutions Lab: ML Immersion
Jupyter Notebook
1,469
star
18

localllm

Python
1,449
star
19

cloud-builders

Builder images and examples commonly used for Google Cloud Build
Go
1,354
star
20

cloud-foundation-fabric

End-to-end modular samples and landing zones toolkit for Terraform on GCP.
HCL
1,343
star
21

vertex-ai-samples

Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud
Jupyter Notebook
1,331
star
22

cloud-builders-community

Community-contributed images for Google Cloud Build
Go
1,233
star
23

data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Jupyter Notebook
1,230
star
24

berglas

A tool for managing secrets on Google Cloud
Go
1,223
star
25

cloud-sql-proxy

A utility for connecting securely to your Cloud SQL instances
Go
1,218
star
26

kubernetes-engine-samples

Sample applications for Google Kubernetes Engine (GKE)
HCL
1,178
star
27

functions-framework-nodejs

FaaS (Function as a service) framework for writing portable Node.js functions
TypeScript
1,162
star
28

cloud-vision

Sample code for Google Cloud Vision
Python
1,093
star
29

DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
Java
1,078
star
30

bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
Java
1,030
star
31

php-docs-samples

A collection of samples that demonstrate how to call Google Cloud services from PHP.
PHP
944
star
32

buildpacks

Builders and buildpacks designed to run on Google Cloud's container platforms
Go
937
star
33

deploymentmanager-samples

Deployment Manager samples and templates.
Jinja
928
star
34

bank-of-anthos

Retail banking sample application showcasing Kubernetes and Google Cloud
Java
926
star
35

cloud-foundation-toolkit

The Cloud Foundation toolkit provides GCP best practices as code.
Go
916
star
36

flask-talisman

HTTP security headers for Flask
Python
896
star
37

DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
857
star
38

gsutil

A command line tool for interacting with cloud storage services.
Python
857
star
39

k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
Go
826
star
40

nodejs-getting-started

A tutorial for creating a complete application using Node.js on Google Cloud Platform
JavaScript
800
star
41

keras-idiomatic-programmer

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework
Jupyter Notebook
797
star
42

gcr-cleaner

Delete untagged image refs in Google Container Registry or Artifact Registry
Go
795
star
43

metacontroller

Lightweight Kubernetes controllers as a service
Go
790
star
44

getting-started-python

Code samples for using Python on Google Cloud Platform
Python
756
star
45

magic-modules

Add Google Cloud Platform support to Terraform
HTML
753
star
46

awesome-google-cloud

A curated list of awesome stuff for Google Cloud.
742
star
47

mlops-on-gcp

Jupyter Notebook
728
star
48

dotnet-docs-samples

.NET code samples used on https://cloud.google.com
C#
719
star
49

click-to-deploy

Source for Google Click to Deploy solutions listed on Google Cloud Marketplace.
Ruby
709
star
50

cloud-sdk-docker

Google Cloud CLI Docker Image - Docker Image containing the gcloud CLI and its bundled components.
Dockerfile
697
star
51

iap-desktop

IAP Desktop is a Windows application that provides zero-trust Remote Desktop and SSH access to Linux and Windows VMs on Google Cloud.
C#
687
star
52

tf-estimator-tutorials

This repository includes tutorials on how to use the TensorFlow estimator APIs to perform various ML tasks, in a systematic and standardised way
Jupyter Notebook
671
star
53

functions-framework-python

FaaS (Function as a service) framework for writing portable Python functions
Python
670
star
54

flink-on-k8s-operator

[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
Go
659
star
55

terraform-google-examples

Collection of examples for using Terraform with Google Cloud Platform.
HCL
573
star
56

functions-framework-dart

FaaS (Function as a service) framework for writing portable Dart functions
Dart
531
star
57

cloud-run-button

Let anyone deploy your GitHub repos to Google Cloud Run with a single click
Go
520
star
58

govanityurls

Use a custom domain in your Go import path
Go
513
star
59

bigquery-oreilly-book

Source code accompanying: BigQuery: The Definitive Guide by Lakshmanan & Tigani to be published by O'Reilly Media
Jupyter Notebook
499
star
60

getting-started-java

Java
478
star
61

ml-on-gcp

Machine Learning on Google Cloud Platform
Python
476
star
62

ipython-soccer-predictions

Sample iPython notebook with soccer predictions
Jupyter Notebook
473
star
63

ai-platform-samples

Official Repo for Google Cloud AI Platform. Find samples for Vertex AI, Google Cloud's new unified ML platform at: https://github.com/GoogleCloudPlatform/vertex-ai-samples
Jupyter Notebook
453
star
64

practical-ml-vision-book

Jupyter Notebook
441
star
65

gradle-appengine-templates

Freemarker based templates that build with the gradle-appengine-plugin
439
star
66

distributed-load-testing-using-kubernetes

Distributed load testing using Kubernetes on Google Container Engine
Smarty
438
star
67

terraform-validator

Terraform Validator is not an officially supported Google product; it is a library for conversion of Terraform plan data to CAI Assets. If you have been using terraform-validator directly in the past, we recommend migrating to `gcloud beta terraform vet`.
Go
436
star
68

hackathon-toolkit

GCP Hackathon Toolkit
HTML
434
star
69

monitoring-dashboard-samples

TypeScript
428
star
70

nodejs-docker

The Node.js Docker image used by Google App Engine Flexible.
TypeScript
406
star
71

cloud-ops-sandbox

Cloud Operations Sandbox is an open source collection of tools that helps practitioners to learn O11y and R9y practices from Google and apply them using Cloud Operations suite of tools.
HCL
398
star
72

cloud-code-vscode

Cloud Code for Visual Studio Code: Issues, Documentation and more
392
star
73

k8s-stackdriver

Go
390
star
74

professional-services-data-validator

Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Python
375
star
75

cloud-code-samples

Code templates to make working with Kubernetes feel like editing and debugging local code.
Java
374
star
76

require-so-slow

`require`s taking too much time? Profile 'em.
TypeScript
373
star
77

functions-framework-go

FaaS (Function as a service) framework for writing portable Go functions
Go
373
star
78

k8s-multicluster-ingress

kubemci: Command line tool to configure L7 load balancers using multiple kubernetes clusters
Go
372
star
79

compute-image-packages

Packages for Google Compute Engine Linux images.
Python
370
star
80

healthcare

Python
367
star
81

android-docs-samples

Java
365
star
82

stackdriver-errors-js

Client-side JavaScript exception reporting library for Cloud Error Reporting
JavaScript
358
star
83

google-cloud-iot-arduino

Google Cloud IOT Example on ESP8266
C++
340
star
84

istio-samples

Istio demos and sample applications for GCP
Shell
331
star
85

ios-docs-samples

iOS samples that demonstrate APIs and services of Google Cloud Platform.
Swift
325
star
86

mlops-with-vertex-ai

An end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI
Jupyter Notebook
317
star
87

cloud-code-intellij

Plugin to support the Google Cloud Platform in IntelliJ IDEA - Docs and Issues Repository
315
star
88

gcping

The source for the CLI and web app at gcping.com
Go
303
star
89

spring-cloud-gcp

New home for Spring Cloud GCP development starting with version 2.0.
Java
299
star
90

airflow-operator

Kubernetes custom controller and CRDs to managing Airflow
Go
296
star
91

security-analytics

Community Security Analytics provides a set of community-driven audit & threat queries for Google Cloud
Python
289
star
92

elixir-samples

A collection of samples on using Elixir with Google Cloud Platform.
Elixir
289
star
93

gke-networking-recipes

Shell
286
star
94

datalab-samples

Jupyter Notebook
281
star
95

compute-archlinux-image-builder

A tool to build a Arch Linux Image for GCE
Shell
280
star
96

solutions-terraform-cloudbuild-gitops

HCL
276
star
97

kotlin-samples

Kotlin
276
star
98

gcpdiag

gcpdiag is a command-line diagnostics tool for GCP customers.
Python
268
star
99

PerfKitExplorer

PerfKit Explorer is a dashboarding and performance analysis tool built with Google technologies and easily extensible. PerfKit Explorer is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.
JavaScript
268
star
100

kube-jenkins-imager

Shell
261
star