• Stars
    star
    160
  • Rank 234,703 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A generative latent variable model for biological sequence families.

DeepSequence

DeepSequence is a generative, unsupervised latent variable model for biological sequences. Given a multiple sequence alignment as an input, it can be used to predict accessible mutations, extract quantitative features for supervised learning, and generate libraries of new sequences satisfying apparent constraints. It models higher-order dependencies in sequences as a nonlinear combination of constraints between subsets of residues. For more information, check out the paper on biorxiv and the examples below.

For ease of analysis, we advise that alignments be generated with the EVcouplings package, though any sequence alignment can be used.

Codebase is compatible with Python 2.7 and Theano 1.0.1. For GPU-enabled computation, CUDA will have to be installed separately. See INSTALL for more details.

Examples

For reasonable training time, we advise training DeepSequence on a GPU:

THEANO_FLAGS='floatX=float32,device=cuda' python run_svi.py

However, it can be run on the CPU with:

python run_svi.py

Other usage examples and features of the analysis are available in iPython notebooks in the examples subfolder.

More Repositories

1

EVcouplings

Evolutionary couplings from protein and RNA sequence alignments
Jupyter Notebook
189
star
2

plmc

Inference of couplings in proteins and RNAs from sequence variation
C
98
star
3

SeqDesign

Protein design and variant prediction using autoregressive generative models
Python
65
star
4

EVmutation

Mutation effects predicted from sequence co-variation
HTML
45
star
5

EVzoom

Visually explore covariation in protein families
JavaScript
34
star
6

neural-fingerprint-theano

Visual Convolutional Neural Graph Fingerprints in Theano/Lasagne
HTML
32
star
7

variational-synthesis

Repository for the paper "Optimal design of stochastic DNA synthesis protocols based on generative sequence models" (Weinstein et al., AISTATS, 2022).
Python
25
star
8

NEMO

Learning protein structure with a differentiable simulator
Python
25
star
9

BEAR

This repository is for the paper "A generative nonparametric Bayesian model for whole genomes"
Python
12
star
10

MuE

A package for making MuE observation models in Edward2.
Python
12
star
11

3D_from_DMS_Extended_Data

Jupyter Notebook
10
star
12

variants_pharmacogenes

This repository contains the code used to analyse data and produce figures for the manuscript "Genetic variation in human drug-related genes"
Jupyter Notebook
8
star
13

persistent-vi

Variational Bayes for discrete undirected models
C
7
star
14

detectDesign

toolkit for finding likely cas9 off-target binding and effect on gene expression, designing sgRNAs and pairs of sgRNAs with minimal off-target effect on gene-expression
Jupyter Notebook
7
star
15

nanobody-polyreactivity

Polyreactivity Website
Python
6
star
16

GELMMnet

Generalized linear mixed model elastic net
Jupyter Notebook
3
star