Modern Hopfield Networks and Attention for Immune Repertoire Classification

Michael Widrich¹, Bernhard Schäfl¹, Milena Pavlović^{3 4}, Hubert Ramsauer¹, Lukas Gruber¹, Markus Holzleitner¹, Johannes Brandstetter¹, Geir Kjetil Sandve⁴, Victor Greiff³, Sepp Hochreiter^{1 2}, Günter Klambauer¹

(1) ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
(2) Institute of Advanced Research in Artificial Intelligence (IARAI)
(3) Department of Immunology, University of Oslo, Oslo, Norway
(4) Department of Informatics, University of Oslo, Oslo, Norway

Paper: https://arxiv.org/abs/2007.13505
Poster: neurips_poster.pdf

This package provides:

modular and customizable DeepRC implementation for massive multiple instance learning problems, such as immune repertoire classification,
CNN and LSTM sequence embedding,
single- or multi-task settings (simple building-block principle),
support for custom datasets,
examples that you can quickly adapt to your problem settings.

Will be added:

multiple attention heads/queries,
Integrated Gradients analysis (write me an email (widrich at ml.jku.at) if you urgently need a preliminary version).

Installation

pip

You can install this package via pip:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install git+https://github.com/ml-jku/DeepRC

To update your installation with dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --upgrade git+https://github.com/ml-jku/DeepRC

To update your installation without dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --no-dependencies --upgrade git+https://github.com/ml-jku/DeepRC

Usage

To run the examples, download the github repo as .zip file, extract the .zip file, and navigate into the extracted directory (you should see a deeprc folder and the README.md there).

Can't wait? Examples are here: deeprc/examples/.

Training DeepRC on pre-defined datasets

You can train a DeepRC model on the pre-defined datasets of the DeepRC paper using one of the Python files in folder deeprc/examples/examples_from_paper. The datasets will be downloaded automatically (please only download them once and then reuse the downloaded versions).

You can use tensorboard --logdir [results_directory] --port=6060 and open http://localhost:6060/ in your web-browser to view the performance.

Real-world data with implanted signals

This is category has the smallest dataset files and is a good starting point. Training a binary DeepRC classifier on dataset "0" of category "real-world data with implanted signals":

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals 0 --n_updates 10000 --evaluate_at 2000

To get more information, you can use the help function:

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals -h

LSTM-generated data

Training a binary DeepRC classifier on dataset "0" of category "LSTM-generated data":

python3 -m deeprc.examples.examples_from_paper.lstm_generated 0

Simulated immunosequencing data

Training a binary DeepRC classifier on dataset "0" of category "simulated immunosequencing data":

python3 -m deeprc.examples.examples_from_paper.simulated 0

Warning: Filesize to download is ~20GB per dataset!

Real-world data

Training a binary DeepRC classifier on dataset "real-world data":

python3 -m deeprc.examples.examples_from_paper.cmv

Training DeepRC on a custom dataset

You can train DeepRC on custom text-based datasets, such as the small example dataset deeprc/datasets/example_dataset. Specifications of the supported dataset formats are give here: deeprc/datasets/README.md.

You can change the dataset directory and task description in the examples listed below and start training a DeepRC model on your task:

Training a binary DeepRC classifier on a small random example dataset using 1D CNN sequence embedding:

python3 -m deeprc.examples.example_single_task_cnn.py

Training DeepRC in a multi-task setting on a small random example dataset using 1D CNN sequence embedding:

python3 -m deeprc.examples.example_multitask_cnn.py

Training DeepRC in a multi-task setting on a small random example dataset using LSTM sequence embedding:

python3 -m deeprc.examples.example_multitask_lstm.py

Datasets

The datasets will be automatically downloaded when running the examples from section "Training DeepRC on pre-defined datasets". You can also manually download the datasets here: https://ml.jku.at/research/DeepRC/datasets/. Please see our paper for descriptions of the datasets.

Structure

deeprc
      |--datasets : stores datasets
      |   |--example_dataset : Small example dataset
      |   |--README.md : Information on supported dataset formats
      |   |--splits_used_in_paper : Dataset splits as used in paper
      |--deeprc : DeepRC implementation
      |   |--architectures.py : DeepRC network architecture
      |   |--dataset_converters.py : Converter for text-based datasets
      |   |--dataset_readers.py : Tools for reading datasets
      |   |--predefined_datasets.py : Pre-defined datasets from paper
      |   |--task_definitions.py : Tools for defining the task to train DeepRC on
      |   |--training.py : Tools for training DeepRC model
      |--examples : DeepRC examples
      |   |--examples_from_paper : Examples on datasets used in paper
      |--neurips_poster.pdf : Poster from NeurIPS2020 poster session

Note

I'm currently cleaning up and uploading the code for the paper. There might be (and probably are) some bugs which will be fixed soon. If you need help with running DeepRC in the meantime, feel free to write me an email (widrich at ml.jku.at).

Best wishes,

Michael

Requirements

I relaxed the package versions to untested versions now. Please see the list below for the tested package versions and let me know if some higher package version fails.

Python3.6.9 or higher
Python packages:
- Pytorch (tested with version 1.3.1)
- numpy (tested with version 1.18.2)
- h5py (tested with version 2.9.0)
- dill (tested with version 0.3.0)
- pandas (tested with version 0.24.2)
- tqdm (tested with version 0.24.2)
- scikit-learn (tested with version 0.22.2.post1)
- requests (tested with version 2.21.0)
- tensorboard (tested with version 1.15.0)
- widis-lstm-tools (tested with version 0.4)

ml-jku/DeepRC

ml-jku

Reviews

Repository Details