• Stars
    star
    133
  • Rank 272,600 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dynamic Token Expansion with Continual Transformers, accepted at CVPR 2022

DyTox

Transformers for Continual Learning with DYnamic TOken eXpansion

Paper CVPR Youtube

DyTox main figure

Welcome to DyTox, the first transformer designed explicitly for Continual Learning!

Work led by Arthur Douillard and co-authored with Alexandre RamΓ©, Guillaume Couairon, and Matthieu Cord.

See our erratum here.

Installation

You first need to have a working python installation with version >= 3.6.

Then create a conda env, and install the libraries in the requirements.txt: it includes pytorch and torchvision for the building blocks of our model. It also includes continuum for data loader made for continual learning, and timm.

Note that this code is heavily based on the great codebase of DeiT.

Launching an experiment

CIFAR100 dataset will be auto-downloaded, however you must download yourself ImageNet.

Each command needs three options files:

  • which dataset you want to run on and in which settings (i.e. how many steps)
  • Which class ordering, by default it'll be 0->C-1, but we used the class ordering proposed by DER (and which all baselines also follow)
  • Which model version you want (DyTox, DyTox+, and DyTox++ (see supplementary about that last one))

To launch DyTox on CIFAR100 in the 50 steps setting on the GPUs #0 and #1:

bash train.sh 0,1 \
    --options options/data/cifar100_2-2.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml \
    --name dytox \
    --data-path MY_PATH_TO_DATASET \
    --output-basedir PATH_TO_SAVE_CHECKPOINTS \
    --memory-size 1000

Folders will be auto-created with the results at logs/cifar/2-2/{DATE}/{DATE}/{DATE}-dytox.

Likewise, to launch DyTox+ and DyTox++, simply change the options. It's also the same for datasets. Note that we provided 3 different class orders (from DER's implem) for CIFAR100, and we average the results in our paper.

When you have a doubt about the options to use, just check what was defined in the yaml option files in the folder ./options.

Resuming an experiment

Some exp can be slow and you may need to resume it, like for ImageNet1000.

First locate the checkpoints folder (by default at ./checkpoints/ if you didn't specify any output-basedir) where your experiment first ran. Then run the following command (I'm taking ImageNet1000 as an example but you could have taken any models and datasets):

bash train.sh 0,1 \
    --options options/data/imagenet1000_100-100.yaml options/data/imagenet1000_order1.yaml options/model/imagenet_dytox.yaml \
    --name dytox \
    --data-path MY_PATH_TO_DATASET \
    --resume MY_PATH_TO_CKPT_FOLDER_OF_EXP \
    --start-task TASK_ID_STARTING_FROM_0_OF_WHEN_THE_EXP_HAD_STOPPED \
    --memory-size 20000

Results

ImageNet

ImageNet figure results ImageNet table results

CIFAR100

CIFAR figure results CIFAR table results

Frequenly Asked Questions

Is DyTox pretrained?

  • No! It's trained from scratch for fair comparison with previous SotAs

Your encoder is made actually of ConVit blocks, can I use something else? Like a MHSA or Swin?

  • Yes! I've used ConVit blocks because they trained well from scratch on small datasets like CIFAR

Can I add a new datasets?

Could I use a convolution-based backbone for the encoder instead of transformer blocks?

  • Yes! You'd need to modify the DyTox module. I already provide several CNNs. Note that for best results, you may want to remove the ultimate block of the CNN and add strides so that the spatial features are big enough at the end to make enough "tokens"

Do I need to install nvidia's apex for the mix precision?

  • No! DyTox uses Pytorch native mix precision

Can I run DyTox on a single GPU instead of two?

  • In theory, yes. Although the performance is a bit lower. I'll try to find the root cause of this. But on two GPUs the results are perfectly reproducible.

What is this finetuning phase?

  • New classes data is downsampled to the same amount of old classes data stored in the rehearsal memory. And the encoder is frozen. You can see which modules are frozen in which task in the options files.

Memory setting?

  • If you use distributed memory (default), use 20/N images per class with N the number of used GPUs. Thus for 2 GPUs, it's --memory-size 1000 for CIFAR100 and ImageNet100 and --memory-size 10000 for Imagenet1000. If you use global memory (--global-memory), use 20 images per class.

Distributed memory?

Results obtained on >=2 GPUs are slightly different from the first version of the paper?

Citation

If you compare to this model or use this code for any means, please cite us! Thanks :)

@inproceedings{douillard2021dytox,
  title     = {DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion},
  author    = {Douillard, Arthur and Ram\'e, Alexandre and Couairon, Guillaume and Cord, Matthieu},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2022}
}

More Repositories

1

incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).
Python
383
star
2

CVPR2021_PLOP

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
Python
140
star
3

deepcourse

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki
Jupyter Notebook
133
star
4

keras-snapshot_ensembles

Implementation in Keras of: Snapshot Ensembles: Train 1, get M for free (https://arxiv.org/abs/1704.00109)
Python
25
star
5

keras-mobilenet

Implementation in Keras of MobileNet (https://arxiv.org/abs/1704.04861)
Python
23
star
6

keras-effnet

Implementation in Keras of Effnet (https://arxiv.org/abs/1801.06434)
Python
21
star
7

keras-shufflenet

Implementation in Keras of ShuffleNet (https://arxiv.org/abs/1707.01083)
Python
19
star
8

nalu.pytorch

Implementation of NALU & NAC (https://arxiv.org/abs/1808.00508 | DeepMind) in PyTorch.
Jupyter Notebook
17
star
9

mada.pytorch

Unfinished Work: Implementation of Multi-Adversarial Domain Adaptation (https://arxiv.org/abs/1809.02176) in Pytorch
Python
16
star
10

turing_pattern_generator

A generator of Turing patterns from an image
Jupyter Notebook
11
star
11

continual-learning-terminology

10
star
12

water_simulation

Water simulation with OpenGL
C
10
star
13

tensorflow-faceid

Faceid-like in Tensorflow using a Siamese network with contrastive loss
Python
10
star
14

awesome-deeplearning-papers

A collection of Deep Learning papers I read, sorted by category.
Python
9
star
15

keras-squeeze_and_excitation_network

Implementation in Keras of Squeeze and Excitation (https://arxiv.org/abs/1709.01507)
Python
6
star
16

Continual_Learning_Leaderboards

Learderboards of Continual Learning for various benchmarks.
4
star
17

optimizers.pytorch

A collection of Optimizers, from famous to exotic, implemented in PyTorch
Python
4
star
18

teledetection

Implementation in C of a custom k-means for clouds detection in satellite images.
C
4
star
19

FastRadixTree

Orthographic Corrector in CPP using a Trie.
C++
2
star
20

distributed_memory_mpi

Distributed memory with MPI in Python, features also map/reduce/filter!
Python
1
star
21

quiz

JavaScript
1
star
22

phd_thesis

TeX
1
star
23

Soundrain

Soundcloud music downloader
Python
1
star
24

MoviesPopularity

An app that rates movies according to its comments, incremental learning is also done.
Scala
1
star
25

Reflex-Tap

A small website (used to practice) about a fictive mobile app (english & Korean version)
HTML
1
star
26

Smart-Saleman

Basic solution for the saleman problem.
Python
1
star
27

arthurdouillard.github.io

My current blog, auto-updated from the template https://github.com/arthurdouillard/hugo-website
HTML
1
star
28

Blind-Mouse-in-a-Maze

Blind Mouse in a Maze - Interview question
C++
1
star
29

elix_anki_scrapper

Scrapper of Elix (French's Sign Language) for Anki
Jupyter Notebook
1
star
30

coursera-R-programming

Assignements of the `R Programming` course on coursera
R
1
star
31

algo_with_mpi

Some basic algos with mpi for Python (mpi4py)
Python
1
star