• Stars
    star
    1,000
  • Rank 45,878 (Top 1.0 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

scGPT

This is the official codebase for scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.

Preprint

Documentation

!UPDATE: We have released several new pretrained scGPT checkpoints. Please see the Pretrained scGPT checkpoints section for more details.

Installation

scGPT is available on PyPI. To install scGPT, run the following command:

$ pip install scgpt

[Optional] We recommend using wandb for logging and visualization.

$ pip install wandb

For developing, we are using the Poetry package manager. To install Poetry, follow the instructions here.

$ git clone this-repo-url
$ cd scGPT
$ poetry install

Note: The flash-attn dependency usually requires specific GPU and CUDA version. If you encounter any issues, please refer to the flash-attn repository for installation instructions. For now, May 2023, we recommend using CUDA 11.7 and flash-attn<1.0.5 due to various issues reported about installing new versions of flash-attn.

Pretrained scGPT Model Zoo

Here is the list of pretrained models. Please find the links for downloading the checkpoint folders. We recommend using the whole-human model for most applications by default. If your fine-tuning dataset shares similar cell type context with the training data of the organ-specific models, these models can usually demonstrate competitive performance as well.

Model name Description Download
whole-human (recommended) Pretrained on 33 million normal human cells. link
brain Pretrained on 13.2 million brain cells. link
blood Pretrained on 10.3 million blood and bone marrow cells. link
heart Pretrained on 1.8 million heart cells link
lung Pretrained on 2.1 million lung cells link
kidney Pretrained on 814 thousand kidney cells link
pan-cancer Pretrained on 5.7 million cells of various cancer types link

Fine-tune scGPT for scRNA-seq integration

Please see our example code in examples/finetune_integration.py. By default, the script assumes the scGPT checkpoint folder stored in the examples/save directory.

To-do-list

  • Upload the pretrained model checkpoint
  • Publish to pypi
  • Provide the pretraining code with generative attention masking
  • Finetuning examples for multi-omics integration, cell type annotation, perturbation prediction, cell generation
  • Example code for Gene Regulatory Network analysis
  • Documentation website with readthedocs
  • Bump up to pytorch 2.0
  • New pretraining on larger datasets
  • Reference mapping example
  • Publish to huggingface model hub

Contributing

We greatly welcome contributions to scGPT. Please submit a pull request if you have any ideas or bug fixes. We also welcome any issues you encounter while using scGPT.

Acknowledgements

We sincerely thank the authors of following open-source projects:

Citing scGPT

@article{cui2023scGPT,
title={scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI},
author={Cui, Haotian and Wang, Chloe and Maan, Hassaan and Pang, Kuan and Luo, Fengning and Wang, Bo},
journal={bioRxiv},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}

More Repositories

1

MedSAM

Segment Anything in Medical Images
Jupyter Notebook
2,848
star
2

U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Python
656
star
3

Graph-Mamba

Graph-Mamba: Towards Long-Range Graph Sequence Modelling with Selective State Spaces
Python
213
star
4

MedSAMSlicer

3D Slicer Plugin for Segment anything in medical images
Python
148
star
5

joint-ner-and-re

This repository contains the corpora and supplementary data, along with instructions for recreating the experiments, for our paper: "End-to-end Named Entity Recognition and Relation Extraction using Pre-trained Language Models".
Jupyter Notebook
91
star
6

ECG-FM

An electrocardiogram analysis foundation model.
Jupyter Notebook
70
star
7

BIONIC

Biological Network Integration using Convolutions
Python
59
star
8

BLEEP

Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning
Python
58
star
9

clinical-camel

Python
49
star
10

DeepVelo

Python
45
star
11

AGILE

AGILE Platform: A Deep Learning-Powered Approach to Accelerate LNP Development for mRNA Delivery
Python
26
star
12

MEDIQA-Chat-2023

A repository for organizing our submission to the MEDIQA-Chat Tasks @ ACL-ClinicalNLP 2023
Python
21
star
13

DPM-MedImgEnhance

Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement
Python
20
star
14

scFormer

Python
18
star
15

OCAT

Python
17
star
16

genomic-FM

Python
17
star
17

Transformer-GCN-QA

A multi-hop Q/A architecture based on transformers and GCNs.
Python
15
star
18

MAESTER

Masked Autoencoder Guided Segmentation at Pixel Resolution
Python
15
star
19

CONCERTO

Carcinogenicity prediction with graph neural networks
Jupyter Notebook
10
star
20

LabChat

Jupyter Notebook
10
star
21

CellSeg-Transformers

Python
10
star
22

NanoMASK

mouse pet-ct image segmentation
Python
8
star
23

Transplant_Time_Series

HTML
6
star
24

CongFu

CongFu: Conditional Graph Fusion for Drug Synergy Prediction
Python
6
star
25

IntegrAO

Integrate Any Omics: Towards genome-wide data integration for patient stratification
Jupyter Notebook
6
star
26

simATAC

A single-cell ATAC-seq simulation framework (R package)
R
5
star
27

DES

Single-Shot Object Detection with Enriched Semantics
C++
4
star
28

AMOS-MM-Solution

Solution to the AMOS-MM challenge
Python
4
star
29

SIMLR_PY

Python version of SIMLR
Python
3
star
30

unicell

Universal cellular segmentation models
Python
3
star
31

SIMLR

Single-cell Interpretation via Multi-kernel LeaRning
R
2
star
32

gcn-drug-repurposing

Python
2
star
33

Network_Enhancement

A general method to denoise weighted biological networks
MATLAB
1
star
34

rOCAT

R
1
star
35

shape-attentive-unet

Code for our paper SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation. https://arxiv.org/pdf/2001.07645.pdf
Python
1
star
36

ViST-Echo

PD's thesis work towards a Video Swin Transformer for echocardiographic data.
Python
1
star
37

csc413-2024

Jupyter Notebook
1
star
38

Vicus

Exploiting local structures to improve network-based analysis of biological data
MATLAB
1
star