• Stars
    star
    283
  • Rank 146,066 (Top 3 %)
  • Language
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer variants.

awesome-cancer-variant-databases

A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer and normal variants. Contributions welcome.

Also see our bioRxiv manuscript, Resources For Interpreting Variants In Precision Genomic Oncology Applications and Frontiers in Oncology Publication

Clinically-focused databases

  • CanDL - an expert-curated database of potentially actionable driver mutations for molecular pathologists and laboratory directors to facilitate literature-based annotation of genomic testing of tumors. [web app, Download]
  • Cancer Genome Interpreter - designed to support the identification of tumor alterations that drive the disease and detect those that may be therapeutically actionable. CGI relies on existing knowledge collected from several resources and on computational methods that annotate the alterations in a tumor according to distinct levels of evidence. [web app, API]
  • CiViC - CIViC is an open access, open source, community-driven web resource for Clinical Interpretation of Variants in Cancer. [web app, API, Download]
  • DGIdb - Mining the druggable genome for personalized medicine. [web app, API, Download]
  • Database of Curated Mutations (DOCM) - DoCM, the Database of Curated Mutations, is a highly curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification. [web app, API, Download]
  • JAX Lab Clinical Knowledge Base - The Jackson Laboratory Clinical Knowledgebase (CKB) is a semi-automated/manually curated database of gene/variant annotations, therapy knowledge, diagnostic/prognostic information, and clinical trials related to oncology. [web app]
  • MyCancerGenome - My Cancer Genome is a personalized cancer medicine knowledge resource for physicians, patients, caregivers and researchers. My Cancer Genome gives up-to-date information on what mutations make cancers grow and related therapeutic implications, including available clinical trials. [web app, API, Download, may require licensing]
  • OncoKB - OncoKB, a comprehensive and curated precision oncology knowledge base, offers oncologists detailed, evidence-based information about individual somatic mutations and structural alterations present in patient tumors with the goal of supporting optimal treatment decisions. [web app, API, Download]
  • PharmGKB - PharmGKB is a comprehensive resource that curates knowledge about the impact of genetic variation on drug response for clinicians and researchers. [web app, Download]
  • Precision Medicine KnowledgeBase (PMKB) - PMKB is organized to provide information about clinical cancer variants and interpretations in a structured way, as well as allowing users to submit and edit existing entries for continued growth of the knowledgebase. All changes are reviewed by cancer pathologists. [web app, Download]

Catalogs

  • VarSome Large, online aggregation database of variants. VarSome is a knowledge base and aggregator for human genomic variants. We were frustrated with the amount of time it takes to look up variants in a number of public databases, and we decided to act. The result is a comprehensive resource that will save you both time and effort when looking up variant information. example API query

Somatic

Germline

  • dbSNP - [Web app, API, Download]
  • GnomAD - The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. The data set provided on this website spans 123,136 exome sequences and 15,496 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. [Web app, ?API, Download]
  • Kaviar - Kaviar (~Known VARiants) is a compilation of SNVs, indels, and complex variants observed in humans, designed to facilitate testing for the novelty and frequency of observed variants. Kaviar contains 162 million SNV sites (including 25M not in dbSNP) and incorporates data from 35 projects encompassing 77,781 individuals (13.2K whole genome, 64.6K exome). - [web app, API, Download]
  • Exome Aggregation Consortium - [Web app, API, Download]
  • 1000 Genomes - [Web app, API, Download]
  • ClinVar - [Web app, Download]
  • Exome Sequencing Project - [Web app, Download]
  • Genome of the Netherlands
  • UK10K
  • GEUVADIS: Genetic European Variation in Health and Disease
  • SweGen - [Web app, API, Download]

Annotation tools and software

  • PCGR - The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the Ensemblโ€™s Variant Effect Predictor (VEP) with oncology-relevant, up-to-date annotations retrieved flexibly through vcfanno, and produces interactive HTML reports intended for clinical interpretation. - [Software tool]

Variant Effect Prediction tools and databases

More Repositories

1

awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
2,634
star
2

GEOquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor
R
76
star
3

sars2pack

An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources
R
59
star
4

wdlRunR

Elastic, reproducible, and reusable genomic data science tools from R backed by cloud resources
R
34
star
5

SRAdbV2

R Interface to the NCBI SRA metadata
R
23
star
6

ci4cc-informatics-resources

Community-maintained list of resources that the CI4CC organization and the larger cancer informatics community have found useful or are developing.
22
star
7

Bioc2017BigDataWorkshopSession

Tutorial for working with cloud infrastructure and AWS from R
20
star
8

SRAdb

Git mirror of Bioconductor SRAdb package
R
20
star
9

BiocPkgTools

Computable build reports, package metadata, and download stats from the Bioconductor project
R
19
star
10

ngs-analysis

Fork of https://code.google.com/p/ngs-analysis
Python
18
star
11

ngCGH

Tools for producing pseudo-cgh of next-generation sequencing data
Python
17
star
12

Orchestra

workshop platform provider for running docker containers on kubernetes
Python
16
star
13

MachineLearningIntro

Machine learning use cases for teaching
15
star
14

postgresql_zombodb_docker

docker image containing postgresql and zombodb (pre-installed)
Dockerfile
11
star
15

BiocActions

Develop and host GitHub actions for Bioconductor
Dockerfile
11
star
16

CompleteGenomicsTools

Software for manipulating and visualizing Complete Genomics data, with a focus on cancer
Python
11
star
17

ngs

Next Gen Sequencing Utilities
Python
9
star
18

methylumi

R
8
star
19

SDIntroToR

Introduction to R
7
star
20

AtacSeqWorkshop

R
7
star
21

curatedMetagenomicsNextflow

Curated Metagenomics Data Nextflow workflows
Nextflow
6
star
22

pubmed-llm-classifier

gpt-examples
Python
6
star
23

serpentine

Python
6
star
24

GEOmetadb

Github mirror of the GEOmetadb bioconductor package
R
6
star
25

conveyor

NGS pipelines
Python
6
star
26

SimpleSnakemakeTutorial

Shell
6
star
27

VCFWrenchR

Basic R package for VCF reformatting (json and tab-delimited text)
R
5
star
28

textvision

Python
5
star
29

nextflow_on_jupyter

A lightweight introduction to NextFlow, a workflow description language and engine for (mainly) bioinformatics workflows. Leverages notebooks for easy teaching.
Jupyter Notebook
5
star
30

emacs-d

My .emacs.d directory
Emacs Lisp
5
star
31

CCR_NGS

Tools for next-generation sequencing in use at the CCR/NCI
Python
4
star
32

nf-core-cmgd

Nextflow
4
star
33

ITR

Learning and teaching materials for R and Bioconductor
HTML
4
star
34

SDST

Python
4
star
35

COSMIC.build57

An R data package for the COSMIC database
R
4
star
36

GenomicSignatures

R
4
star
37

genetrack-central

GeneTrack is a genomic data visualization software
JavaScript
4
star
38

awesome-variant-databases

A collection of genomic variant databases
4
star
39

ClinicalTrialsAPI

Access the NIH ClinicalTrials.gov REST API
R
4
star
40

TargetOsteoAnalysis

HTML
4
star
41

rEutils

R package for accessing NCBI EUtilities
3
star
42

sars2pack-book

Materials for sars2pack bookdown book
TeX
3
star
43

hour_of_code

Materials for Hour of Code using R
3
star
44

InteractiveApps_Bioc2017

R
3
star
45

HarvardExtremeComputing

Materials for the Fall 2015 Harvard Extreme Computing course
R
3
star
46

OmicIDXR

Access the OmicIDX, a genomics metadata index project, from R
R
3
star
47

Rpressa

Miscellaneous R code for biological data tasks.
R
2
star
48

bioDockerCollection

Dockerfile
2
star
49

BiocGadgets

R
2
star
50

F1000R_BiocWorkflows

Project for publishing from Bioconductor to F1000R and back
HTML
2
star
51

BiocIntegrativeCancerVis

R
2
star
52

SnakemakeRNASeqExample

An example RNAseq pipeline with snakemake on the NIH biowulf cluster
Shell
2
star
53

OrchestratingSingleCellAnalysis

TeX
2
star
54

snakewrappers

Snakemake wrappers, specifically designed for the NIH Biowulf system using TACC Modules
Python
2
star
55

terraform-can

terraform configuration files with focus on big data and genomics
HCL
2
star
56

terraform_vertex_ai_project

Terraform for allowing gcp Vertex AI for a user
HCL
1
star
57

RNASeqBeginnerTutorial

An introduction to RNA-seq file formats, quality control, and visualization targeted to a biologist audience
1
star
58

PedsHemeOncBoardReview

1
star
59

SlurmPipelineWithDependencies

This is a simple slurm pipeline implemented in BASH that uses SLURM's dependency capabilities
Shell
1
star
60

teaching

HTML
1
star
61

BiocPkgToolsManuscript

TeX
1
star
62

MutationTools

R
1
star
63

MultiplatformGEOSurvey

A short use case for GEOmetadb and dplyr
HTML
1
star
64

CuteCats

R
1
star
65

cmgd_web

Web portal and API for the Curated Metagenomics Data project
Python
1
star
66

R4CancerDataSci

1
star
67

aisr-data-warehouse

Animal Image Shared Resource PACS/Viewer
Python
1
star
68

scripts

Miscellaneous, uncategorized scripts
Shell
1
star
69

old.seandavi.github.io

HTML
1
star
70

cmgd_coordinator

Nextflow
1
star
71

bigrna-pipeline

Python
1
star
72

CollaborativeBookdown

An opinionated approach to building a collaborative bookdown book
R
1
star
73

biodata18software

CSS
1
star
74

QFeaturesWorkshop2020

Dockerfile
1
star
75

talks

1
star
76

genetics_branch_retreat_site

A small website generator for a conference that uses a google sheet to populate abstracts.
HTML
1
star