• Stars
    star
    815
  • Rank 54,042 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 1 year ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

scGPT

This is the official codebase for scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.

Preprint

Documentation

!UPDATE: We have released several new pretrained scGPT checkpoints. Please see the Pretrained scGPT checkpoints section for more details.

Installation

scGPT is available on PyPI. To install scGPT, run the following command:

$ pip install scgpt

[Optional] We recommend using wandb for logging and visualization.

$ pip install wandb

For developing, we are using the Poetry package manager. To install Poetry, follow the instructions here.

$ git clone this-repo-url
$ cd scGPT
$ poetry install

Note: The flash-attn dependency usually requires specific GPU and CUDA version. If you encounter any issues, please refer to the flash-attn repository for installation instructions. For now, May 2023, we recommend using CUDA 11.7 and flash-attn<1.0.5 due to various issues reported about installing new versions of flash-attn.

Pretrained scGPT Model Zoo

Here is the list of pretrained models. Please find the links for downloading the checkpoint folders. We recommend using the whole-human model for most applications by default. If your fine-tuning dataset shares similar cell type context with the training data of the organ-specific models, these models can usually demonstrate competitive performance as well.

Model name Description Download
whole-human (recommended) Pretrained on 33 million normal human cells. link
brain Pretrained on 13.2 million brain cells. link
blood Pretrained on 10.3 million blood and bone marrow cells. link
heart Pretrained on 1.8 million heart cells link
lung Pretrained on 2.1 million lung cells link
kidney Pretrained on 814 thousand kidney cells link
pan-cancer Pretrained on 5.7 million cells of various cancer types link

Fine-tune scGPT for scRNA-seq integration

Please see our example code in examples/finetune_integration.py. By default, the script assumes the scGPT checkpoint folder stored in the examples/save directory.

To-do-list

  • Upload the pretrained model checkpoint
  • Publish to pypi
  • Provide the pretraining code with generative attention masking
  • Finetuning examples for multi-omics integration, cell type annotation, perturbation prediction, cell generation
  • Example code for Gene Regulatory Network analysis
  • Documentation website with readthedocs
  • Bump up to pytorch 2.0
  • New pretraining on larger datasets
  • Reference mapping example
  • Publish to huggingface model hub

Contributing

We greatly welcome contributions to scGPT. Please submit a pull request if you have any ideas or bug fixes. We also welcome any issues you encounter while using scGPT.

Acknowledgements

We sincerely thank the authors of following open-source projects:

Citing scGPT

@article{cui2023scGPT,
title={scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI},
author={Cui, Haotian and Wang, Chloe and Maan, Hassaan and Pang, Kuan and Luo, Fengning and Wang, Bo},
journal={bioRxiv},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}

More Repositories

1

MedSAM

Segment Anything in Medical Images
Jupyter Notebook
2,267
star
2

U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Python
511
star
3

Graph-Mamba

Graph-Mamba: Towards Long-Range Graph Sequence Modelling with Selective State Spaces
Python
134
star
4

joint-ner-and-re

This repository contains the corpora and supplementary data, along with instructions for recreating the experiments, for our paper: "End-to-end Named Entity Recognition and Relation Extraction using Pre-trained Language Models".
Jupyter Notebook
90
star
5

MedSAMSlicer

3D Slicer Plugin for Segment anything in medical images
Python
76
star
6

BIONIC

Biological Network Integration using Convolutions
Python
55
star
7

clinical-camel

Python
47
star
8

BLEEP

Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive Learning
Python
40
star
9

DeepVelo

Python
34
star
10

MEDIQA-Chat-2023

A repository for organizing our submission to the MEDIQA-Chat Tasks @ ACL-ClinicalNLP 2023
Python
19
star
11

AGILE

AGILE Platform: A Deep Learning-Powered Approach to Accelerate LNP Development for mRNA Delivery
Python
18
star
12

OCAT

Python
17
star
13

scFormer

Python
16
star
14

Transformer-GCN-QA

A multi-hop Q/A architecture based on transformers and GCNs.
Python
15
star
15

DPM-MedImgEnhance

Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement
Python
14
star
16

MAESTER

Masked Autoencoder Guided Segmentation at Pixel Resolution
Python
11
star
17

CONCERTO

Carcinogenicity prediction with graph neural networks
Jupyter Notebook
10
star
18

LabChat

Jupyter Notebook
9
star
19

NanoMASK

mouse pet-ct image segmentation
Python
8
star
20

simATAC

A single-cell ATAC-seq simulation framework (R package)
R
5
star
21

Transplant_Time_Series

HTML
5
star
22

CongFu

CongFu: Conditional Graph Fusion for Drug Synergy Prediction
Python
5
star
23

DES

Single-Shot Object Detection with Enriched Semantics
C++
4
star
24

SIMLR_PY

Python version of SIMLR
Python
3
star
25

IntegrAO

Integrate Any Omics: Towards genome-wide data integration for patient stratification
Jupyter Notebook
3
star
26

CellSeg-Transformers

Python
3
star
27

SIMLR

Single-cell Interpretation via Multi-kernel LeaRning
R
2
star
28

unicell

Universal cellular segmentation models
Python
2
star
29

gcn-drug-repurposing

Python
2
star
30

shape-attentive-unet

Code for our paper SAUNet: Shape Attentive U-Net for Interpretable Medical Image Segmentation. https://arxiv.org/pdf/2001.07645.pdf
Python
1
star
31

ViST-Echo

PD's thesis work towards a Video Swin Transformer for echocardiographic data.
Python
1
star