• Stars
    star
    2,993
  • Rank 15,106 (Top 0.3 %)
  • Language
    Jupyter Notebook
  • Created almost 5 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

We are building an open database of COVID-19 cases with chest X-ray or CT images.

🛑 Note: please do not claim diagnostic performance of a model without a clinical study! This is not a kaggle competition dataset. Please read this paper about evaluation issues: https://arxiv.org/abs/2004.12823 and https://arxiv.org/abs/2004.05405

COVID-19 image data collection (🎬 video about the project)

Project Summary: To build a public open dataset of chest X-ray and CT images of patients which are positive or suspected of COVID-19 or other viral and bacterial pneumonias (MERS, SARS, and ARDS.). Data will be collected from public sources as well as through indirect collection from hospitals and physicians. All images and data will be released publicly in this GitHub repo.

This project is approved by the University of Montreal's Ethics Committee #CERSES-20-058-D

View current images and metadata and a dataloader example

The labels are arranged in a hierarchy:

Current stats of PA, AP, and AP Supine views. Labels 0=No or 1=Yes. Data loader is here

COVID19_Dataset num_samples=481 views=['PA', 'AP']
{'ARDS': {0.0: 465, 1.0: 16},
 'Bacterial': {0.0: 445, 1.0: 36},
 'COVID-19': {0.0: 162, 1.0: 319},
 'Chlamydophila': {0.0: 480, 1.0: 1},
 'E.Coli': {0.0: 481},
 'Fungal': {0.0: 459, 1.0: 22},
 'Influenza': {0.0: 478, 1.0: 3},
 'Klebsiella': {0.0: 474, 1.0: 7},
 'Legionella': {0.0: 474, 1.0: 7},
 'Lipoid': {0.0: 473, 1.0: 8},
 'MERS': {0.0: 481},
 'Mycoplasma': {0.0: 476, 1.0: 5},
 'No Finding': {0.0: 467, 1.0: 14},
 'Pneumocystis': {0.0: 459, 1.0: 22},
 'Pneumonia': {0.0: 36, 1.0: 445},
 'SARS': {0.0: 465, 1.0: 16},
 'Streptococcus': {0.0: 467, 1.0: 14},
 'Varicella': {0.0: 476, 1.0: 5},
 'Viral': {0.0: 138, 1.0: 343}}

COVID19_Dataset num_samples=173 views=['AP Supine']
{'ARDS': {0.0: 170, 1.0: 3},
 'Bacterial': {0.0: 169, 1.0: 4},
 'COVID-19': {0.0: 41, 1.0: 132},
 'Chlamydophila': {0.0: 173},
 'E.Coli': {0.0: 169, 1.0: 4},
 'Fungal': {0.0: 171, 1.0: 2},
 'Influenza': {0.0: 173},
 'Klebsiella': {0.0: 173},
 'Legionella': {0.0: 173},
 'Lipoid': {0.0: 173},
 'MERS': {0.0: 173},
 'Mycoplasma': {0.0: 173},
 'No Finding': {0.0: 170, 1.0: 3},
 'Pneumocystis': {0.0: 171, 1.0: 2},
 'Pneumonia': {0.0: 26, 1.0: 147},
 'SARS': {0.0: 173},
 'Streptococcus': {0.0: 173},
 'Varicella': {0.0: 173},
 'Viral': {0.0: 41, 1.0: 132}}

Annotations

Lung Bounding Boxes and Chest X-ray Segmentation (license: CC BY 4.0) contributed by General Blockchain, Inc.

Pneumonia severity scores for 94 images (license: CC BY-SA) from the paper Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning

Generated Lung Segmentations (license: CC BY-SA) from the paper Lung Segmentation from Chest X-rays using Variational Data Imputation

Brixia score for 192 images (license: CC BY-NC-SA) from the paper End-to-end learning for semiquantitative rating of COVID-19 severity on Chest X-rays

Lung and other segmentations for 517 images (license: CC BY) in COCO and raster formats by v7labs

Contribute

  • Submit data directly to the project. View our research protocol. Contact us to start the process.

  • We can extract images from publications. Help identify publications which are not already included using a GitHub issue (DOIs we have are listed in the metadata file). There is a searchable database of COVID-19 papers here, and a non-searchable one (requires download) here.

  • Submit data to these sites (we can scrape the data from them):

  • Provide bounding box/masks for the detection of problematic regions in images already collected.

  • See SCHEMA.md for more information on the metadata schema.

Formats: For chest X-ray dcm, jpg, or png are preferred. For CT nifti (in gzip format) is preferred but also dcms. Please contact with any questions.

Background

In the context of a COVID-19 pandemic, we want to improve prognostic predictions to triage and manage patient care. Data is the first step to developing any diagnostic/prognostic tool. While there exist large public datasets of more typical chest X-rays from the NIH [Wang 2017], Spain [Bustos 2019], Stanford [Irvin 2019], MIT [Johnson 2019] and Indiana University [Demner-Fushman 2016], there is no collection of COVID-19 chest X-rays or CT scans designed to be used for computational analysis.

The 2019 novel coronavirus (COVID-19) presents several unique features Fang, 2020 and Ai 2020. While the diagnosis is confirmed using polymerase chain reaction (PCR), infected patients with pneumonia may present on chest X-ray and computed tomography (CT) images with a pattern that is only moderately characteristic for the human eye Ng, 2020. In late January, a Chinese team published a paper detailing the clinical and paraclinical features of COVID-19. They reported that patients present abnormalities in chest CT images with most having bilateral involvement Huang 2020. Bilateral multiple lobular and subsegmental areas of consolidation constitute the typical findings in chest CT images of intensive care unit (ICU) patients on admission Huang 2020. In comparison, non-ICU patients show bilateral ground-glass opacity and subsegmental areas of consolidation in their chest CT images Huang 2020. In these patients, later chest CT images display bilateral ground-glass opacity with resolved consolidation Huang 2020.

Goal

Our goal is to use these images to develop AI based approaches to predict and understand the infection. Our group will work to release these models using our open source Chester AI Radiology Assistant platform.

The tasks are as follows using chest X-ray or CT (preference for X-ray) as input to predict these tasks:

  • Healthy vs Pneumonia (prototype already implemented Chester with ~74% AUC, validation study here)

  • Bacterial vs Viral vs COVID-19 Pneumonia (not relevant enough for the clinical workflows)

  • Prognostic/severity predictions (survival, need for intubation, need for supplemental oxygen)

Expected outcomes

Tool impact: This would give physicians an edge and allow them to act with more confidence while they wait for the analysis of a radiologist by having a digital second opinion confirm their assessment of a patient's condition. Also, these tools can provide quantitative scores to consider and use in studies.

Data impact: Image data linked with clinically relevant attributes in a public dataset that is designed for ML will enable parallel development of these tools and rapid local validation of models. Furthermore, this data can be used for completely different tasks.

Contact

PI: Joseph Paul Cohen. Postdoctoral Fellow, Mila, University of Montreal

Citations

Second Paper available here and source code for baselines

COVID-19 Image Data Collection: Prospective Predictions Are the Future
Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong and Marzyeh Ghassemi
arXiv:2006.11988, https://github.com/ieee8023/covid-chestxray-dataset, 2020
@article{cohen2020covidProspective,
  title={COVID-19 Image Data Collection: Prospective Predictions Are the Future},
  author={Joseph Paul Cohen and Paul Morrison and Lan Dao and Karsten Roth and Tim Q Duong and Marzyeh Ghassemi},
  journal={arXiv 2006.11988},
  url={https://github.com/ieee8023/covid-chestxray-dataset},
  year={2020}
}

Paper available here

COVID-19 image data collection, arXiv:2003.11597, 2020
Joseph Paul Cohen and Paul Morrison and Lan Dao
https://github.com/ieee8023/covid-chestxray-dataset
@article{cohen2020covid,
  title={COVID-19 image data collection},
  author={Joseph Paul Cohen and Paul Morrison and Lan Dao},
  journal={arXiv 2003.11597},
  url={https://github.com/ieee8023/covid-chestxray-dataset},
  year={2020}
}

License

Each image has license specified in the metadata.csv file. Including Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0.

The metadata.csv, scripts, and other documents are released under a CC BY-NC-SA 4.0 license. Companies are free to perform research. Beyond that contact us.

More Repositories

1

PDFViewer

PDF Viewer for Android source code
C
141
star
2

blucat

Blucat (netcat for Bluetooth)
Java
72
star
3

NeuralNetwork-Examples

The same small networks implemented in different frameworks
Jupyter Notebook
70
star
4

medical-reading-group

56
star
5

countception

Count-Ception: Counting by Fully Convolutional Redundant Counting
Jupyter Notebook
52
star
6

JoeGlass

Java
24
star
7

deep-learning-datasets

23
star
8

blindtool

BlindTool – A mobile app that gives a "sense of vision" to the blind with deep learning
Java
12
star
9

latentshift

A method to generate counterfactuals
Jupyter Notebook
11
star
10

CraterDataset

9
star
11

no-citation-page-numbers

BST files to remove the page numbers in CITATIONS
9
star
12

sparse-coding

Jupyter Notebook
6
star
13

UMB-Thesis-Template

A Latex template for the UMB Masters Thesis
TeX
6
star
14

xray-generalization

Jupyter Notebook
5
star
15

XTreePath

HTML
5
star
16

dist-bias

Jupyter Notebook
4
star
17

cratercnn-cli

Java
4
star
18

gravityscript

Automatically exported from code.google.com/p/gravityscript
3
star
19

FaceFinder

Find faces in images in a batch style
C++
3
star
20

movieposters-dataset

3
star
21

frequency-spectrum-dump

Frequency Spectrum Dump Inputs: mp3, wav, ogg, flac, etc Scriptable, Visualization Output: csv file with the mean magnitudes of 255 frequency bands
Java
3
star
22

blucat-android-remote

Java
2
star
23

single-cell-rep

Jupyter Notebook
2
star
24

craterseeker

Open Source Crater Seeker Video Game
JavaScript
2
star
25

pharmacodb

Jupyter Notebook
2
star
26

cs210-summer2014

Java
2
star
27

ct-counterfactuals

This repo contains code and models to generate counterfactual images (modifying the images so a classifier will not predict positive for a class) for 3D Computed Tomography volumes. The method used is Latent Shift.
Jupyter Notebook
2
star
28

BioFormats-Example

HTML
1
star
29

shapes-dataset

Python
1
star
30

cellradioshutoff

The source of Cell Radio Shutoff
Java
1
star
31

joelearn

A collection of scripts that I use
Python
1
star
32

pojoexplorer

You can explore objects by dynamically calling the getters while the code is running!
Java
1
star
33

eigengenes

Python
1
star
34

hpc-demo

Java
1
star
35

ChestXray-NIHCC

Jupyter Notebook
1
star
36

hibernatebrowser

Automatically exported from code.google.com/p/hibernatebrowser
Java
1
star
37

CraterDetection

Python
1
star
38

umb-latex-homework

Automatically exported from code.google.com/p/umb-latex-homework
TeX
1
star
39

historical-weather

Jupyter Notebook
1
star
40

censorship-test

Java
1
star
41

slurm_check

Test scripts to test the Mila cluster
Python
1
star
42

USSConstitutionMuseum-CannonForce

Java
1
star
43

mrna-representation

Jupyter Notebook
1
star
44

weka-api-demo

Automatically exported from code.google.com/p/weka-api-demo
Java
1
star