• Stars
    star
    159
  • Rank 235,916 (Top 5 %)
  • Language
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

EDirect_EUtils_API_Cookbook

Just copy and paste commands off the page. Modify the search strings to work for you!

If there are things you want to be able to do with EDirect, but can't figure out how, you can ask the community for help by creating an Issue. See below, under "How to contribute," for more information.

To install EDirect, follow the instructions in "Entrez Direct: E-utilities on the Unix Command Line"

PLEASE UPDATE TO THE LATEST VERSION of E-Direct when possible to avoid a bug in older versions associated with the new NCBI API rate limit policy and API keys

How to contribute

You can contribute to this page through GitHub. (If you are not already viewing the GitHub version of this page, please click the "View on GitHub" button at the top of the page.) Using GitHub, you can create Issues or Pull Requests to contribute to the cookbook.

Create an Issue to:

  • Request an EDirect script to accomplish a task, citing specific use cases
  • Present a non-working EDirect script and ask for a fix
  • Identify non-working scripts listed below

Create a Pull Request to:

  • Add a working EDirect script to the list below
  • Modify or optimize an EDirect script listed below
  • Update the "Confirmed by:" date/version of a listed EDirect script with confirmation that it is still valid

Best Practices for EDirect:

  • Please keep to <50,000 expected hits (it simply won’t work)
  • Please do not run from multiple processors on a compute farm
  • Update to latest version

For more information and documentation on EDirect, please see:

All items below come with no explicit or implicit warranty.

All code is as-is and produced for the bioinformatics community, from the bioinformatics community.

EDirect Scripts

Get all proteins from a nucleotide interval in a genome

Description (optional):
Written by: Peter Cooper Confirmed by: Ben Busby Databases: Taxonomy

efetch -db nuccore -id NZ_AZKP01000022.1 -seq_start 149413 -seq_stop 154038 -format gbc | xtract -insd CDS INSDInterval_from INSDInterval_to protein_id product

Get child taxids for a node in NCBI taxonomy

Description (optional): Note: Options for parsing nodes.dmp from NCBI Taxonomy are cited in issue #25, intentionally left open Written by: Scott McGinnis (11/17/2017)
Confirmed by:
Databases: Taxonomy

esearch -db taxonomy -query "vertebrata[orgn]" | efetch -db taxonomy -format docsum | xtract -pattern DocumentSummary -if Rank -equals family -element Id,Division,ScientificName,CommonName | more

Get all SRA runs for a BioProject based on an SRA Run ID

Description: Given an SRA Run ID (e.g. SRR532256) that is a member of a BioProject that has additional runs, retrieve all the other run IDs. This is a variant of the BioProject call below. Written by: Rob Edwards (1/11/2018) Confirmed by: Databases: SRA, BioProject

esearch -db sra -query "SRR532256" |  efetch -format docsum | xtract -pattern Runs -ACC @acc  -element "&ACC"

Get all SRA runs for a given BioProject

Description (optional):
Written by: Bob Sanders (3/22/2017)
Confirmed by:
Databases: SRA, BioProject

esearch -db bioproject -query "PRJNA356464" | elink -target sra | efetch -format docsum | \
xtract -pattern DocumentSummary -ACC @acc -block DocumentSummary -element "&ACC"

Get latitiude and longitude for SRA Datasets (e.g. outbreaks and metagenomes)

Description (optional):
Written by: BB, Mike D, Rob Edwards (4/12/2017)
Confirmed by:
Databases: SRA, BioSample

for i in $(cat sra_ids.txt); do ll=$(esearch -db sra -query $i | \
elink -target biosample | efetch -format docsum | \
xtract -pattern DocumentSummary -block Attribute -if Attribute@attribute_name -equals lat_lon -element Attribute); \
echo -e "$i\t$ll"; done

Get run sizes (in bp) for SRA Datasets

Description (optional): This retrieves the SRR id and the size in bp of the run from a file (ids.txt) of SRR IDs. You can also change bases to size_MBto get the size of the dataset in MB. Question: Does the size in MB include the sequence identifiers (i.e. the size of the file) or just the sequences? Written by: Rob Edwards (7/6/2017) Confirmed by: Databases: SRA

epost -db sra -input ids.txt -format acc | esummary -format runinfo -mode xml | xtract -pattern Row -element Run,bases

Gene Aliases

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: gene

esearch -db gene -query "Liver cancer AND Homo sapiens" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element Name OtherAliases OtherDesignations

Genomic sequence fastas from RefSeq assembly for specified taxonomic designation

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Peter Cooper (NCBI) and Wayne Matten (NCBI) (12/29/2016, v6.00)
Databases: assembly

wget `esearch -db assembly -query "Leptospira alstonii[ORGN] AND latest[SB]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element FtpPath_RefSeq | \
awk -F"/" '{print $0"/"$NF"_genomic.fna.gz"}'`
(For larger sets of data the above may fail as wget may not accept a very large number of arguments.
The command below should work for all.)

esearch -db assembly -query "Leptospira alstonii[ORGN] AND latest[SB]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element FtpPath_RefSeq | \
awk -F"/" '{print $0"/"$NF"_genomic.fna.gz"}' | \
xargs wget

Get organellar contigs from genbank

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore

esearch -db nuccore -query "LKAM01" | efetch -format fasta

Get protein sequences from nucleotide accessions

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore, protein

cat accs_file | epost -db nuccore -format acc | \
elink -target protein | efetch -format fasta

Complete taxonomy (KPCOFG) for taxids

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: taxonomy

efetch -db taxonomy -id 9606,1234,81726 -format xml | \
xtract -pattern Taxon -tab "," -first TaxId ScientificName \
-group Taxon -KING "(-)" -PHYL "(-)" -CLSS "(-)" -ORDR "(-)" -FMLY "(-)" -GNUS "(-)" \
-block "*/Taxon" -match "Rank:kingdom" -KING ScientificName \
-block "*/Taxon" -match "Rank:phylum" -PHYL ScientificName \
-block "*/Taxon" -match "Rank:class" -CLSS ScientificName \
-block "*/Taxon" -match "Rank:order" -ORDR ScientificName \
-block "*/Taxon" -match "Rank:family" -FMLY ScientificName \
-block "*/Taxon" -match "Rank:genus" -GNUS ScientificName \
-group Taxon -tab "," -element "&KING" "&PHYL" "&CLSS" "&ORDR" "&FMLY" "&GNUS"

Obtain UniProt IDs from gene symbols

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: gene, protein

esearch -db gene -query "tp53[preferred symbol] AND human[organism]" | \
elink -target protein | \
esummary | \
xtract -pattern DocumentSummary -element Caption SourceDb | \
grep -E '^[OPQ][0-9][A-Z0-9]{3}[0-9]\|^[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}'

Retrieve Taxon IDs from list of genome accession numbers

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore

cat genome_accession.txt | \
epost -db nuccore -format acc | \
esummary | \
xtract -pattern DocumentSummary -element AccessionVersion TaxId

Convert article DOI to PMID

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016, v5.80)
Databases: pubmed

esearch -db pubmed -query "10.1111/j.1468-3083.2012.04708.x" | \
esummary | \
xtract -pattern DocumentSummary -block ArticleId -sep "\t" -tab "\n" -element IdType,Value | \
grep -E '^pubmed|doi'

Access organism specific meta-data from NCBI genome database

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: genome, bioproject

esearch -db genome -query "22954[uid]" | \
elink -target bioproject | \
efetch -format xml | \
xtract -pattern DocumentSummary -element Salinity OxygenReq OptimumTemperature TemperatureRange Habitat

Get the status of records from PubMed search

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016, v5.80)
Databases: pubmed

esearch -db pubmed -query "pde3a AND 2016[dp]" | \
esummary | \
xtract -pattern DocumentSummary -element Id RecordStatus

Conduct a PubMed search and retrieve the results as a list of PMIDs

Description (optional):
Written by: Mike Davidson (2/22/2017)
Confirmed by: Mike Davidson (NLM) (2/22/2017, v6.30)
Databases: pubmed

esearch -db pubmed -query "seasonal affective disorder" | efetch -format uid

Sort the hits by sequence length in nucleotide database

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore

esearch -db nuccore -query "bacillus[orgn] AND biomol_rRNA[prop] AND 1500:1560[slen]" | \
esummary | \
xtract -pattern DocumentSummary -element Slen Extra | \
sort -rnk 1

Getting meta data from assembly

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: assembly

esearch -db assembly -query "mammals[orgn] AND latest[filter]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element Organism,SpeciesName,BioSampleAccn,LastMajorReleaseAccession \
-block Stat -if "@category" -equals chromosome_count -element Stat | \
grep -Pv "\t0$"

Fetch HSPs from a BLAST hit in FASTA

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore

blastn -db nr -query in.fna -remote -outfmt "6 sacc sstart send" | \
xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'

Get all Gene Ontology IDs for a given protein accession

Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: protien, biosystems

epost -db protein -id BAD92651.1 -format acc | \
elink -target biosystems | \
efetch -format docsum | \
xtract -pattern externalid -element externalid | \
awk '{if ($0 ~ /GO/) print $0}'

Get the ten most frequently-occurring authors for a set of articles

Description (optional): Searches PubMed for the string "traumatic brain injury athletes", restricts results to those published in 2015 and 2016, retrieves the full XML records for each of the search results, extracts the last name and initials of every author on every record, sorts the authors by frequency of occurrence in the results set, and presents the top ten most frequently-occurring authors, along with the number of times that author appeared.
Written by: Mike Davidson (NLM) (12/15/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016)
Databases: pubmed

esearch -db pubmed -query "traumatic brain injury athletes" -datetype PDAT -mindate 2015 -maxdate 2016 | \
efetch -format xml | \
xtract -pattern Author -sep " " -element LastName,Initials | \
sort-uniq-count-rank | \
head -n 10

Get the ten funding agencies who are most active in funding articles on a particular topic

Description (optional): Searches PubMed for the string "diabetes AND pregnancy", restricts results to those published in 2014 through 2016, retrieves the full XML records for each of the search results, extracts the funding agencies for every grant on every record, sorts the agencies by frequency of occurrence in the results set, and presents the top ten most frequently-occurring agencies, along with the number of times that agency appeared.
Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed

esearch -db pubmed -query "diabetes AND pregnancy" -datetype PDAT -mindate 2014 -maxdate 2016 | \
efetch -format xml | \
xtract -pattern Grant -element Agency | \
sort-uniq-count-rank | \
head -n 10

Look up the publication date for thousands of PMIDs (option one)

Description (optional): Takes a file which contains a list of PMIDs (table_of_pubmed_ids) and uses cat to access the contents of the file, epost to post the PMIDs to the history server, efetch to retrieve the records and xtract to extract PMID and Publication Date.
Written by: NCBI Folks (12/15/2016)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed

cat table_of_pubmed_ids | \
epost -db pubmed | \
efetch -format xml | \
xtract -pattern PubmedArticle -element MedlineCitation/PMID \
-block PubDate -sep " " -element Year,Month MedlineDate

Look up the publication date for thousands of PMIDs (option two)

Description (optional): Takes a file which contains a list of PMIDs (table_of_pubmed_ids) and epost -input to access the contents of the file and post the PMIDs to the history server, efetch to retrieve the records and xtract to extract PMID and Publication Date.
Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed

epost -input table_of_pubmed_ids -db pubmed | \
efetch -format xml | \
xtract -pattern PubmedArticle -element MedlineCitation/PMID \
-block PubDate -sep " " -element Year,Month MedlineDate

Find the first author for a set of PubMed records

Description (optional): Outputs the PMID and first author's last name and initials for one or more PubMed records Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed

efetch -db pubmed -id 16940437 -format xml | \
xtract -pattern PubmedArticle -element MedlineCitation/PMID \
-block Author -position first -sep " " -element LastName,Initials

Find the first author and any other authors who contributed equally for a set of PubMed records

Description (optional): Outputs the PMID and first author's last name and initials for one or more PubMed records. If the record indicates equal contributors to the first author, the last name and initials for all equal contributors will also be output, separated by commas.
Written by: Mike Davidson (10/27/2017)
Confirmed by: Mike Davidson (NLM) (v7.40, 10/27/2017)
Databases: pubmed

efetch -db pubmed -id 22358458,26877147 -format xml | \
xtract -pattern PubmedArticle -element MedlineCitation/PMID \
-block Author -position first -sep " " -tab ", " -element LastName,Initials -EQUAL Author@EqualContrib \
-block Author -if "+" -is-not 1 \
-and Author@EqualContrib -equals Y \
-and "&EQUAL" -equals Y \
-sep " " -tab ", " -element LastName,Initials

Download GEO Data from a BioProject Accession

Description (optional):
Written by: NCBI Folks (12/16/2016)
Confirmed by:
Databases: gds

esearch -db gds -query "PRJNA313294[ACCN]" | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element FTPLink

Extract all MeSH Headings from a given PMID

Description (optional): Retrieves the PMID of a PubMed record, followed by a pipe-delimitted list of MeSH Descriptors for a PMID.
Written by: Mike Davidson (10/02/2017)
Confirmed by: Mike Davidson (NLM) (v7.30, 10/02/2017)
Databases: pubmed

efetch -db pubmed -id 24102982 -format xml | \
xtract -pattern PubmedArticle -tab "|" -element MedlineCitation/PMID \
-block MeshHeading -tab "|" -element DescriptorName

Extract all MeSH Headings and Subheadings from a given PMID

Description (optional): Retrieves the PMID of a PubMed record, followed by a pipe-delimitted list of MeSH Descriptors and Qualifiers for a PMID. Each Descriptor is followed by any attached qualifiers, separated by "/".
Written by: Mike Davidson (10/02/2017)
Confirmed by: Mike Davidson (NLM) (v7.30, 10/02/2017)
Databases: pubmed

efetch -db pubmed -id 24102982 -format xml | \
xtract -pattern PubmedArticle -tab "|" -element MedlineCitation/PMID \
-block MeshHeading -tab "|" -sep "/" -element DescriptorName,QualifierName

Search for articles by authors affiliated with a specific institution by matching two partial affiliation strings.

Description (optional): Searching PubMed for two affiliation strings ANDed together (e.g. "translational medicine[AD] AND thomas jefferson[AD]") will retrieve all records that have both strings listed somewhere in the record's Affiliation data, but does not require both strings be listed on the same author's affiliation. To generate a list of PMIDs where both strings are present in the same affiliation element, use the following script.
Written by: Mike Davidson (4/2/2018)
Confirmed by: Mike Davidson (NLM) (v8.10, 4/2/2018)
Databases: pubmed

esearch -db pubmed -query "translational medicine[ad] AND thomas jefferson[ad]" | \
efetch -format xml | \
xtract -pattern PubmedArticle -PMID MedlineCitation/PMID \
-block Affiliation -if Affiliation -contains "translational medicine" -and Affiliation -contains "thomas jefferson" \
-tab "\n" -element "&PMID" | \
sort -n | uniq

Search for PMC articles citing a gived PubMed articler; retrieve title, source, ID

Description: Retrieve information about all PMC articles (wihich have free fulltext available) which cite a gived PubMed article Written by: Lukas Wagner (08/16/2018) Databases: pubmed, pmc

esearch -db pubmed -query 23618408 | elink -name pubmed_pmc_refs -target pmc | \
efetch -format docsum | \
xtract -pattern DocumentSummary -element Title -element Source -block ArticleId -if "IdType" -equals pmcid -element Value

More Repositories

1

TheHumanPangenome

A Strategy for Building and Using a Human Reference Pangenome
Jupyter Notebook
70
star
2

Community_Software_Tools_for_NGS

61
star
3

NovoGraph

NovoGraph: building whole genome graphs from long-read-based de novo assemblies
Perl
45
star
4

rnaseqview

RNA-seq Viewer Team at the NCBI-assisted Boston Genomics Hackathon
JavaScript
36
star
5

ConsensusML

Machine Learning to Detect Cancer Biomarkers from RNAseq Data
HTML
33
star
6

VirusDiscoveryProject

Software, architecture, and data index design for the 2018/2019 Virus Discovery Project
Jupyter Notebook
31
star
7

drugdisco

A high throughput automated drug discovery pipeline.
JavaScript
29
star
8

Pharmacogenomics_Prediction_Pipeline_P3

R
28
star
9

NCBIComputationalCookbook

Jupyter notebooks to more effectively leverage computational resources at NCBI.
Jupyter Notebook
28
star
10

ViruSpy

A pipeline for viral identification from metagenomic samples
Shell
26
star
11

SPeW

Automatic Packaging and Distribution of Bioinformatics Pipelines
Python
26
star
12

MR_BACOn

Mendelian Randomization with Biomarker Associations for Causality with Outcomes
R
23
star
13

Machine_Learning_Immunogenicity

This is a repo for the Machine Learning Immunogenicity Team in the August 2016 NCBI Hackathon
Jupyter Notebook
23
star
14

LabPype

Framework for Creating Pipeline Software
Python
21
star
15

MetagenomicAntibioticResistance

NastyBugs: a simple method for extracting antimicrobial resistance information from metagenomes
Shell
20
star
16

CTEligible

Use machine learning to find patterns of similar eligibility protocol criteria for clinical trials
PowerShell
18
star
17

RNA-Seq-in-the-Cloud

An Easy to Use Analysis System for All Human Public bulk RNAseq Data!
Jupyter Notebook
18
star
18

Design-of-ICD-9-to-10-conversion-function-for-the-R-package-icd

Develop a function to be incorporated into the R package 'icd' that will convert International Classification of Diseases codes from Ninth to Tenth revisions
R
18
star
19

RNA_mapping

Python
17
star
20

Network_SNPs

A framework for network analysis and display of SNPs
Python
17
star
21

SRA_Tinder

Find hot data sets in your area (of research)!
Jupyter Notebook
16
star
22

ncbi-cloud-tutorials

Tutorial content for NCBI cloud data and computing
Jupyter Notebook
16
star
23

Master_gff3_parser

Convert sequence IDs between ucsc/refseq/genbank
Python
16
star
24

GoodDoc

A Template for Clear and Simple Documentation of Bioinformatics Code
15
star
25

seqacademy

Self-guided educational workshop for ChIP-Seq and RNA-Seq
HTML
14
star
26

ATACFlow

An ATAC-seq pipeline wrapped in NextFlow that can be run by Jupyter
Jupyter Notebook
14
star
27

svcompare

HTML
14
star
28

Structural_Variant_Comparison

SV
Python
14
star
29

Semantic-search-log-analysis-pipeline

Classify web visitor queries so you can chart, and respond to, trends in information seeking
JavaScript
14
star
30

PubRunner

Framework for running text mining tools on latest publications. Main page at:
JavaScript
14
star
31

deVoReaNN

A virtual reality environment for physically assembling deep learning models to solve data science problems.
C#
12
star
32

seqr

Java
12
star
33

HLAClustRView

R package specialized in HLA typing clustering and visualization based on specific similarity metrics
R
12
star
34

phenotypeXpression

Subclassification of disease states based on the intersection of literature and expression
Python
12
star
35

hackathon_v001_metagenomics

Metagenomics Pipeline Repository for January, 2015 NCBI/ADDS Hackathon at NIH
Shell
11
star
36

FlowBio

A fast, easy way to present complex bioinformatics pipelines to biologists
Shell
11
star
37

GeneExpressionAging

Gene expression viewer template
HTML
11
star
38

The-Broad-Institute-Single-Cell-RNA-Seq-Data-Set

Visualize cancer genomes with FAIR single-cell RNA-seq data
Python
11
star
39

Biological-structure-segmentation-in-microscopy-images-using-deep-learning

Jupyter Notebook
10
star
40

TCGA_dbGaP

Python
10
star
41

Virus_Detection_SRA

Perl
10
star
42

PrecisionMedicineToolkit

Search public databases for given genotypic information
Python
10
star
43

Bringing-the-Power-of-Synthetic-Data-Generation-to-the-Masses

We aim to make it easier for biomedical researchers to access and customize synthetic sequence data for the purpose of sharing and testing analysis methods as well as training and collaboration
Jupyter Notebook
10
star
44

SimpleGeneExpression

Programs to quantify expression of transcripts from public datasets
R
9
star
45

Metabolomics-Data-Portal

R shiny application for the visualization and analysis of untargeted metabolomics datasets.
R
9
star
46

deSRA

An automated protocol to extract variation or expression from public NGS datasets
JavaScript
9
star
47

chervil

A detection algorithm for expression features that correspond to previous viral infection
Shell
9
star
48

Structural_Variants_CSHL

Perl
8
star
49

Kipoi-GWAS

Jupyter Notebook
8
star
50

VirusFriends

VirusFriends: discover viral sequences in the NCBI Sequence Read Archive!
Python
8
star
51

Got_Plasmid

Retreive and visualize plasmid sequences from SRA and Next Generation Sequencing data.
R
8
star
52

Mutation_burden

Building a pipeline to assess effects of mutation burden
R
8
star
53

Epigenomics_CWL

SCREW: A Reproducible Workflow for Single-Cell Epigenomics
R
8
star
54

PhenVar

Python
8
star
55

ClusterDuck

Disease Clustering from Literature Based on Minimal Training Data
Python
7
star
56

AssesSV

HTML
7
star
57

Code_in_PubMed_Abstracts

Python
7
star
58

GeneHummus

An Automated Pipeline to Classify Gene Families based on Protein Domain Organization using Auxin Response Factors in Legumes as an Example
R
7
star
59

Clustering-autism-phenotypes-by-automated-text-analysis

A versatile tool to classify diseases using cluster analysis of published phenotypic data
Python
7
star
60

Hidden-Figures

A pipeline for inferring gender for acknowledged individuals in scientific literature on a massive scale
Jupyter Notebook
7
star
61

Cancer_Epitopes_CSHL

A pipeline to approximate the immunogenicity of peptides resulting from cancer mutations based on structure and other factors.
HTML
7
star
62

HASSL_Homogeneous_Analysis_of_SRA_rnaSequencing_Libraries

Python
7
star
63

PubCode

An app platform for CLI Apps that engage NCBI/NLM Data and Services
HTML
7
star
64

Virulence_Factor_Characterization

Virulence Factor Characterization in Metagenomes
Jupyter Notebook
7
star
65

OnlineAdapterDatabase

Linking publicly deposited data to sequencing adapters.
Python
7
star
66

CapNetProtStruct

Capsule Networks for improving protein secondary structure prediction accuracy
Python
7
star
67

NCBI_Jupyter

A variety of NCBI Computational Tools Distributed as Jupyter Notebooks
Jupyter Notebook
6
star
68

PSST

Polygenic SNP Search Tool
Python
6
star
69

EndoVir

Discovery of Novel Endogenous Viruses
Python
6
star
70

UPWARD

UPWARD: Uniting People Working Against Rare Diseases
PHP
6
star
71

clint

Linking clinical questions with fMRI research literature
Jupyter Notebook
6
star
72

Tumor_sim

Simulation of Tumor Genomes -- Initiated at the 2017 NYGC-NCBI Hackathon
Python
6
star
73

hackathon_v001_rnaseq

RNAseq Pipeline Repository for January, 2015 NCBI/ADDS Hackathon at NIH
Python
6
star
74

Run_an_NCBI-style_hackathon

Collaborative Computational Development, or, How to Run an NCBI-Style Hackathon
6
star
75

SRA2R

SRA2R, a package to import SRA data directly into R
HTML
6
star
76

TCRecePy

A python tool that uses Machine Learning to identify cancer targeting T-Cells
Python
6
star
77

Awesome_enhancer_promoter_dbs

A semi-curated list of enhancer and promoter databases. If you know of others, please put them in issues!
6
star
78

CakeCell

Segmenting cells (and other objects!) in microscopy images via neural networks.
Python
6
star
79

ContainerInception

Gerber: Generalized Easy Reproducible Bioinformatics Environment wRapper
Python
5
star
80

Metadata_categorization

A crowdsourcing/expert curation platform for metadata categorization.
Web Ontology Language
5
star
81

DiseaseCluster

Disease prediction based on transcriptomics clustering
HTML
5
star
82

PuRSSE

Pubmed Research Search String Extraction (PuRSSE)
Jupyter Notebook
5
star
83

Complex_Phenogeno

Mapping complex genotypes to phenotypic subclusters
Python
5
star
84

TriFECTA

The goal of this project is use natural language processing to extract exclusion and inclusion criteria from free form text fields to match patients with clinical trials.
Shell
5
star
85

RetroSpotter

A computational pipeline to find Human Endogenous Retroviruses in RNA Seq Data
Jupyter Notebook
5
star
86

LNCPEP

A machine learning approach to detect micropeptides from noncoding RNAs
Python
5
star
87

Visualizing_MeSH_Term_Interaction_Over_Time

A tool to visually browse co-occurrence of MeSH terms in PubMeb
JavaScript
5
star
88

HistoloMaps

The fastest gigascale image annotation system in the world!
JavaScript
5
star
89

ScrubSV

A QC pipeline for SVs calls based on coverage and SNP calls
R
5
star
90

Ultrafast_Mapping_CSHL

Python
5
star
91

TraIN

HTML
5
star
92

Graph_Extraction

JavaScript
5
star
93

BarcSeek

BarcSeek: A Flexible Barcode Partitioning Tool for Demultiplexing Genomic Sequencing Data
Python
5
star
94

NCBI_August_Hackathon_Push_Button_Genomics_Solution

Python
5
star
95

HAQmap

Push button solution for setting up an NCBI-style Hackathon
HTML
5
star
96

MutPredMerge

Consolidation of tools in the MutPred Suite to work with VCF files
Python
5
star
97

PyClonal

Jupyter Notebooks to analyze T-cell Receptor Sequencing
Jupyter Notebook
5
star
98

Viral-VDAP

Viral VDAP: a viral alignment, variant discovery, and annotation pipeline launched at the NCBI-Hackathon 2019
Python
5
star
99

ContamFilter

Implements NCBI Contamination Screen Publicly in CWL
Python
5
star
100

PubMed2GenePairs

Text-driven identification and ranking of associated gene pairs in PubMed
Python
4
star