• Stars
    star
    936
  • Rank 48,823 (Top 1.0 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 2 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

🧬 gget enables efficient querying of genomic reference databases

gget

pypi version image Downloads Conda license status status Code Coverage

gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

alt text

If you use gget in a publication, please cite*:

Luebbert, L., & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

Read the article here: https://doi.org/10.1093/bioinformatics/btac836

Installation

pip install --upgrade gget

Alternative:

conda install -c bioconda gget

For use in Jupyter Lab / Google Colab:

import gget

πŸ”— Manual

πŸͺ„ Quick start guide

Command line:

# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release
$ gget ref homo_sapiens

# Get Ensembl IDs of human genes with "ace2" or "angiotensin converting enzyme 2" in their name/description
$ gget search -s homo_sapiens 'ace2' 'angiotensin converting enzyme 2'

# Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519
$ gget info ENSG00000130234 ENST00000252519

# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234
$ gget seq --translate ENSG00000130234

# Quickly find the genomic location of (the start of) the amino acid sequence returned by gget seq
$ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# BLAST (the start of) the amino acid sequence returned by gget seq
$ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# Align nucleotide or amino acid sequences stored in a FASTA file
$ gget muscle path/to/file.fa

# Use Enrichr for an ontology analysis of a list of genes
$ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P

# Get the human tissue expression of gene ACE2
$ gget archs4 -w tissue ACE2

# Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank 
# (PDB IDs can be returned by gget info with flag --pdb)
$ gget pdb 1R42 -o 1R42.pdb

# Fetch an scRNAseq count matrix (AnnData format) based on specified gene(s), tissue(s) and cell type(s) (default species: human)
$ gget setup cellxgene # setup only needs to be run once
$ gget cellxgene --gene ACE2 SLC5A1 --tissue lung --cell_type 'mucus secreting cell' -o example_adata.h5ad

# Predict the protein structure of GFP from its amino acid sequence
$ gget setup alphafold # setup only needs to be run once
$ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK

Python (Jupyter Lab / Google Colab):

import gget
gget.ref("homo_sapiens")
gget.search(["ace2", "angiotensin converting enzyme 2"], "homo_sapiens")
gget.info(["ENSG00000130234", "ENST00000252519"])
gget.seq("ENSG00000130234", translate=True)
gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.muscle("path/to/file.fa")
gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)
gget.archs4("ACE2", which="tissue")
gget.pdb("1R42", save=True)

gget.setup("cellxgene") # setup only needs to be run once
gget.cellxgene(gene = ["ACE2", "SLC5A1"], tissue = "lung", cell_type = "mucus secreting cell")

gget.setup("alphafold") # setup only needs to be run once
gget.alphafold("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK")

Call gget from R using reticulate:

system("pip install gget")
install.packages("reticulate")
library(reticulate)
gget <- import("gget")

gget$ref("homo_sapiens")
gget$search(list("ace2", "angiotensin converting enzyme 2"), "homo_sapiens")
gget$info(list("ENSG00000130234", "ENST00000252519"))
gget$seq("ENSG00000130234", translate=TRUE)
gget$blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget$muscle("path/to/file.fa", out="path/to/out.afa")
gget$enrichr(list("ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"), database="ontology")
gget$archs4("ACE2", which="tissue")
gget$pdb("1R42", save=TRUE)

More examples

More Repositories

1

kallisto

Near-optimal RNA-Seq quantification
C
654
star
2

ffq

A tool to find sequencing data and metadata from public databases.
Python
538
star
3

BI-BE-CS-183-2023

Introduction to Computational Biology and Bioinformatics Course at Caltech, 2023
Jupyter Notebook
389
star
4

sleuth

Differential analysis of RNA-Seq
R
305
star
5

poseidon

poseidon system - open source syringe pumps and microscope for laboratories
Jupyter Notebook
168
star
6

kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
Python
147
star
7

kallistobustools

kallisto | bustools workflow for pre-processing single-cell RNA-seq data
115
star
8

seqspec

machine-readable file format for genomic library sequence and structure
Python
110
star
9

voyager

From geospatial to spatial -omics
R
72
star
10

picasso

Picasso: a methods for embedding points in 2D in a way that respects distances while fitting a user-specified shape.
Jupyter Notebook
69
star
11

scRNA-Seq-TCC-prep

Preprocessing of single-cell RNA-Seq (deprecated)
Jupyter Notebook
62
star
12

metakallisto

Using kallisto for metagenomic analysis
Python
50
star
13

LP_2021

TeX
45
star
14

kallisto-transcriptome-indices

Reference transcriptome indices build from kallisto for popular organisms
41
star
15

sircel

Identify cell barcodes from single-cell genomics sequencing experiments
Jupyter Notebook
41
star
16

SpatialFeatureExperiment

Extension of SpatialExperiment with sf
R
36
star
17

MCML

Python
33
star
18

kallisto_paper_analysis

Analysis from kallisto paper
HTML
32
star
19

NYMP_2018

Jupyter Notebook
29
star
20

kma

Keep Me Around: Intron Retention Detection
Python
28
star
21

monod

The Monod package fits CME models to sequencing data.
Python
27
star
22

gget_examples

Examples for gget (https://github.com/pachterlab/gget).
Jupyter Notebook
26
star
23

colosseum

colosseum system - open source fraction collector for laboratories
Jupyter Notebook
24
star
24

GFCP_2022

RNA velocity validation
Jupyter Notebook
23
star
25

MBGBLHGP_2019

Code for reproducing results from the paper "Modular and efficient pre-processing of single-cell RNA-seq data"
Jupyter Notebook
23
star
26

BBB

Bioinformatics for Benched Biologists
Jupyter Notebook
22
star
27

BHGP_2022

Jupyter Notebook
21
star
28

kite

kallisto index tag extractor
Python
20
star
29

aggregationDE

Scripts and software supplement for "Gene-level differential analysis at transcript-level resolution" by Yi, Pimentel, Bray and Pachter
R
20
star
30

qcbc

Jupyter Notebook
19
star
31

splitcode

Flexible and efficient parsing, interpreting and editing of sequencing reads
C
18
star
32

PCCA

Code for performing PCA followed by CCA
Python
18
star
33

sleuth_paper_analysis

Code to reproduce analyses from the sleuth paper
R
17
star
34

concordex

Identification of spatial homogeneous regions
Python
17
star
35

RMEJLBASBMP_2024

Repository for the paper "The impact of package selection and versioning on single-cell RNA-seq analysis"
Jupyter Notebook
17
star
36

CGCCP_2023

scVI extension for unspliced RNA
Jupyter Notebook
16
star
37

CBP_2021

Jupyter Notebook
16
star
38

BYVSTZP_2020

This repository contains the code for reproducing all the results and figures from the preprint "Isoform specificity in the mouse primary motor cortex".
Jupyter Notebook
15
star
39

SBP_2019

Code for producing the analysis in the "Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq" manuscript
Jupyter Notebook
15
star
40

voyagerpy

Python
12
star
41

bears_analyses

Examples of kallisto + sleuth
Python
11
star
42

GSP_2019

Code for reproducing results from the paper "RNA velocity and protein acceleration from single-cell multiomics experiments."
Jupyter Notebook
11
star
43

kallisto-sleuth-workshop-2016

materials and website for the 2016 kallisto sleuth workshop
CSS
11
star
44

lair

home of the bear's lair
CSS
10
star
45

sleuth_walkthroughs

Some sleuth walkthroughs to help you get started
HTML
10
star
46

scATAK

Jupyter Notebook
10
star
47

concordexR

Compute the neighborhood consolidation matrix and identify SHRs
R
10
star
48

MBLGLMBHGP_2021

Jupyter Notebook
8
star
49

Bi-BE-CS-183-2022

Website for the 2021-2022 Caltech class Bi/BE/CS 183: Introduction to Computational Biology and Bioinformatics
Jupyter Notebook
8
star
50

bcl2fastq

source code for bcl2fastq2, files from illumina
C++
8
star
51

CP_2023

Jupyter Notebook
7
star
52

bam2tcc

C++
7
star
53

voyager-testing

Python
7
star
54

GVP_2023

scRNA-seq, regulation, and sysbio
Jupyter Notebook
6
star
55

monod_examples

Tutorials for the Monod package, which fits CME models to sequencing data.
Jupyter Notebook
6
star
56

BGP_2023

Jupyter Notebook
6
star
57

biophysics

Repository for Pachter Lab Biophysics
Python
6
star
58

museumst

Museum of Spatial Transcriptomics
R
6
star
59

GCCP_2022

Roff
5
star
60

SGYP_2019

Jupyter Notebook
5
star
61

BLCSBGLKP_2020

Code for analysis of SARS-CoV-2 sequencing based diagnostic testing data
Jupyter Notebook
5
star
62

kallisto-D

C
5
star
63

zika

sleuth workflow for processing zika RNA-seq dataset
R
5
star
64

SHSOHMP_2024

Code for reproducing the results in the second version of the preprint "Accurate quantification of single-nucleus and single-cell RNA-seq transcripts"
C++
5
star
65

LP_2024

Jupyter Notebook
5
star
66

GBP_2024

Jupyter Notebook
5
star
67

pegasus

modular stepper motor control with Arduino, CNC motor sheild, and Pololu stepper driver. also the workhorse of poseidon and colosseum
Python
4
star
68

SP_2019

Jupyter Notebook
4
star
69

GVFP_2021

SDE comparison preprint
Jupyter Notebook
4
star
70

COVID19-County

COVID-19 data from LA County
Jupyter Notebook
4
star
71

BGP_2024

Jupyter Notebook
4
star
72

FGP_2024

Jupyter Notebook
4
star
73

HSHMP_2022

Python
3
star
74

bibecs183

Bi/BE/CS 183 Winter 2019 - Introduction to Computational Biology and Bioinformatics
Jupyter Notebook
3
star
75

AAQuant

Annotation-Agnostic RNA-seq Quantification
C++
3
star
76

BSP_2023

Jupyter Notebook
3
star
77

BP_2020_2

log(x+1) and log(1+x)
Jupyter Notebook
2
star
78

CP_2021

Code for reproducing the results in "The Split Senate" paper
Python
2
star
79

GRNP_2020

Repository for reproducing the results and figures in Gustafsson et al. 2020
Jupyter Notebook
2
star
80

BKMGP_2021

Jupyter Notebook
2
star
81

isolate_transcripts

Python
2
star
82

CWGFLHGCCHAP_2021

Jupyter Notebook
2
star
83

DBALLSMRDMCMGWSTPMBDKPFP_2023

Jupyter Notebook
2
star
84

GP_2020

Code to reproduce results in the paper "Special Function Methods for Bursty Models of Transcription"
MATLAB
2
star
85

SFEData

Example SpatialFeatureExperiment datasets
R
2
star
86

PROBer

PROBer: A general toolkit for analyzing sequencing-based β€˜toeprinting’ assays
C++
2
star
87

BTRBP_2020

Jupyter Notebook
2
star
88

BP_2020

Decrease in ACE2 mRNA expression in aged mouse lung, bioRxiv, 2020.
Jupyter Notebook
2
star
89

GP_2021_4

HTML
2
star
90

CGP_2024_2

Jupyter Notebook
2
star
91

HPM_2022

Simulations of the robustness of AAQuant to noise
R
1
star
92

YLMP_2018

Scripts to reproduce analysis in YLMP, 2018
Shell
1
star
93

bcltools

(still in development only a few things work) tools for converting bcls to fastqs and fastqs to bcls
Python
1
star
94

eXpress

Streaming fragment assignment for real-time analysis of sequencing experiments
1
star
95

make

Open source bioinstrumentation projects
1
star
96

GPCTP_2019-2

1
star
97

GP_2020_2

Intrinsic/extrinsic noise mini-project
Jupyter Notebook
1
star
98

GP_2021_3

Jupyter Notebook
1
star
99

kallisto_tests

A set of (regression) tests for kallisto
1
star
100

KBP_2023

1
star