• Stars
    star
    112
  • Rank 312,240 (Top 7 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DeepRC: Immune repertoire classification with attention-based deep massive multiple instance learning

Modern Hopfield Networks and Attention for Immune Repertoire Classification

Michael Widrich1, Bernhard Schäfl1, Milena Pavlović3 4, Hubert Ramsauer1, Lukas Gruber1, Markus Holzleitner1, Johannes Brandstetter1, Geir Kjetil Sandve4, Victor Greiff3, Sepp Hochreiter1 2, Günter Klambauer1

(1) ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
(2) Institute of Advanced Research in Artificial Intelligence (IARAI)
(3) Department of Immunology, University of Oslo, Oslo, Norway
(4) Department of Informatics, University of Oslo, Oslo, Norway

This package provides:

  • modular and customizable DeepRC implementation for massive multiple instance learning problems, such as immune repertoire classification,
  • CNN and LSTM sequence embedding,
  • single- or multi-task settings (simple building-block principle),
  • support for custom datasets,
  • examples that you can quickly adapt to your problem settings.

Will be added:

  • multiple attention heads/queries,
  • Integrated Gradients analysis (write me an email (widrich at ml.jku.at) if you urgently need a preliminary version).

Installation

pip

You can install this package via pip:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install git+https://github.com/ml-jku/DeepRC

To update your installation with dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --upgrade git+https://github.com/ml-jku/DeepRC

To update your installation without dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --no-dependencies --upgrade git+https://github.com/ml-jku/DeepRC

Usage

To run the examples, download the github repo as .zip file, extract the .zip file, and navigate into the extracted directory (you should see a deeprc folder and the README.md there).

Can't wait? Examples are here: deeprc/examples/.

Training DeepRC on pre-defined datasets

You can train a DeepRC model on the pre-defined datasets of the DeepRC paper using one of the Python files in folder deeprc/examples/examples_from_paper. The datasets will be downloaded automatically (please only download them once and then reuse the downloaded versions).

You can use tensorboard --logdir [results_directory] --port=6060 and open http://localhost:6060/ in your web-browser to view the performance.

Real-world data with implanted signals

This is category has the smallest dataset files and is a good starting point. Training a binary DeepRC classifier on dataset "0" of category "real-world data with implanted signals":

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals 0 --n_updates 10000 --evaluate_at 2000

To get more information, you can use the help function:

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals -h
LSTM-generated data

Training a binary DeepRC classifier on dataset "0" of category "LSTM-generated data":

python3 -m deeprc.examples.examples_from_paper.lstm_generated 0
Simulated immunosequencing data

Training a binary DeepRC classifier on dataset "0" of category "simulated immunosequencing data":

python3 -m deeprc.examples.examples_from_paper.simulated 0

Warning: Filesize to download is ~20GB per dataset!

Real-world data

Training a binary DeepRC classifier on dataset "real-world data":

python3 -m deeprc.examples.examples_from_paper.cmv

Training DeepRC on a custom dataset

You can train DeepRC on custom text-based datasets, such as the small example dataset deeprc/datasets/example_dataset. Specifications of the supported dataset formats are give here: deeprc/datasets/README.md.

You can change the dataset directory and task description in the examples listed below and start training a DeepRC model on your task:

Training a binary DeepRC classifier on a small random example dataset using 1D CNN sequence embedding:
python3 -m deeprc.examples.example_single_task_cnn.py
Training DeepRC in a multi-task setting on a small random example dataset using 1D CNN sequence embedding:
python3 -m deeprc.examples.example_multitask_cnn.py
Training DeepRC in a multi-task setting on a small random example dataset using LSTM sequence embedding:
python3 -m deeprc.examples.example_multitask_lstm.py

Datasets

The datasets will be automatically downloaded when running the examples from section "Training DeepRC on pre-defined datasets". You can also manually download the datasets here: https://ml.jku.at/research/DeepRC/datasets/. Please see our paper for descriptions of the datasets.

Structure

deeprc
      |--datasets : stores datasets
      |   |--example_dataset : Small example dataset
      |   |--README.md : Information on supported dataset formats
      |   |--splits_used_in_paper : Dataset splits as used in paper
      |--deeprc : DeepRC implementation
      |   |--architectures.py : DeepRC network architecture
      |   |--dataset_converters.py : Converter for text-based datasets
      |   |--dataset_readers.py : Tools for reading datasets
      |   |--predefined_datasets.py : Pre-defined datasets from paper
      |   |--task_definitions.py : Tools for defining the task to train DeepRC on
      |   |--training.py : Tools for training DeepRC model
      |--examples : DeepRC examples
      |   |--examples_from_paper : Examples on datasets used in paper
      |--neurips_poster.pdf : Poster from NeurIPS2020 poster session

Note

I'm currently cleaning up and uploading the code for the paper. There might be (and probably are) some bugs which will be fixed soon. If you need help with running DeepRC in the meantime, feel free to write me an email (widrich at ml.jku.at).

Best wishes,

Michael

Requirements

I relaxed the package versions to untested versions now. Please see the list below for the tested package versions and let me know if some higher package version fails.

More Repositories

1

hopfield-layers

Hopfield Networks is All You Need
Python
1,660
star
2

hopular

Hopular: Modern Hopfield Networks for Tabular Data
Python
299
star
3

cloob

Python
152
star
4

clamp

Code for the paper Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language
Python
85
star
5

HopCPT

Conformal Prediction for Time Series with Modern Hopfield Networks
Python
67
star
6

GINNs-Geometry-informed-Neural-Networks

Jupyter Notebook
54
star
7

helm

Python
52
star
8

sars-cov-inhibitors-chemai

Large-scale ligand-based virtual screening for potential SARS-Cov-2 inhibitors using a deep neural network
52
star
9

L2M

Learning to Modulate pre-trained Models in RL (Decision Transformer, LoRA, Fine-tuning)
Python
49
star
10

rudder

RUDDER: Return Decomposition for Delayed Rewards
45
star
11

mhn-react

Modern Hopfield Network (MHN) for template relevance prediction
Jupyter Notebook
40
star
12

mc-lstm

Experiments with Mass Conserving LSTMs
Python
38
star
13

UPT

Code for the paper Universal Physics Transformers
Python
37
star
14

hyper-dti

HyperPCM: Robust task-conditioned modeling of drug-target interactions
Python
34
star
15

MIM-Refiner

A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Python
34
star
16

lsc

Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
C++
33
star
17

cloome

Jupyter Notebook
32
star
18

MAE-CT

Python
31
star
19

quam

Quantification of Uncertainty with Adversarial Models
Jupyter Notebook
27
star
20

DIffUCO

Python
26
star
21

hopfield-boosting

Jupyter Notebook
25
star
22

semantic-image-text-alignment

Python
24
star
23

SDLG

SDLG is an efficient method to accurately estimate aleatoric semantic uncertainty in LLMs
Jupyter Notebook
23
star
24

vnegnn

Python
22
star
25

rudder-demonstration-code

Code for demonstration example-task in RUDDER blog
Python
21
star
26

align-rudder

Code to reproduce results on toy tasks and companion blog for the paper.
Python
20
star
27

oeaw_ai_summer_school

Introduction to Machine Learning and Neural Networks, including NNs, CNNs, RNNs, AE, VAE, and GAN
19
star
28

chef

Python
15
star
29

subgd

Code for "Few-Shot Learning by Dimensionality Reduction in Gradient Space"
Jupyter Notebook
14
star
30

mgenerators-failure-modes

Shows some of the ways molecule generation and optimization can go wrong
Jupyter Notebook
14
star
31

reactive-exploration

Code for the paper "Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning"
Python
14
star
32

MHNfs

Jupyter Notebook
13
star
33

hti-cnn

Python
11
star
34

VAG-CO

Python
11
star
35

tensorflow-layer-library

Tensorflow Layer Library (TeLL)
Python
10
star
36

bgnn

Python
9
star
37

gapnet-pl

Python
8
star
38

GNN-VPA

A Variance-Preserving Aggregation Strategy for Graph Neural Networks
Python
5
star
39

melvin

Python
4
star
40

convex-init

Code for Principled Weight Initialisation for Input-Convex Neural Networks
Python
4
star
41

PlattScaling

Python Package for Platt scaling based on the algorithm according to Lin, Lin and Weng
Python
3
star
42

TPDistance

Computation of 3D Triangle Point Distances (on a GPU)
Python
3
star
43

diverse-hits

Code accompanying our paper on diversity-based comparison of goal-directed generators
Jupyter Notebook
3
star
44

covid

Python
2
star
45

haprfn-R

R
2
star
46

LRAM

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Python
1
star
47

ExCAPE

Code for DNNs in the ExCAPE-Project
Python
1
star
48

blog-post-performer

Blog post to the Performer paper including exemplary notebooks
1
star
49

ml-jku.github.io

Repository for the website of Institute for Machine Learning, Linz.
JavaScript
1
star
50

autoregressive_activity_prediction

This repo includes code for the autoregressive activity prediction for low-data drug discovery manuscript
Jupyter Notebook
1
star