• Stars
    star
    120
  • Rank 296,083 (Top 6 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created over 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is the legacy version used for ICML 2018 rejected submission and kept here for reference

Please checkout multigpu branch for latest (cleaner) version with newer experiments reported in EMNLP 2018 paper

==================================

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

PyTorch implementation of the models described in the paper Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement.

We present code for training and decoding both autoregressive and non-autoregressive models, as well as preprocessed datasets and pretrained models.

Dependencies

Python

  • Python 3.6
  • PyTorch 0.3
  • Numpy
  • NLTK
  • torchtext
  • torchvision

GPU

  • CUDA (we recommend using the latest version. The version 8.0 was used in all our experiments.)

Related code

Downloading Datasets & Pre-trained Models

The original translation corpora can be downloaded from (IWLST'16 En-De, WMT'16 En-Ro, WMT'15 En-De, MS COCO). For the preprocessed corpora and pre-trained models, see below.

Dataset Model
IWSLT'16 En-De Data Models
WMT'16 En-Ro Data Models
WMT'15 En-De Data Models
MS COCO Data Models

Before you run the code

Set correct path to data in data_path() function located in data.py:

Loading & Decoding from Pre-trained Models

  1. For vocab_size, use 60000 for WMT'15 En-De, 40000 for the other translation datasets and 10000 for MS COCO.
  2. For params, use big for WMT'15 En-De and small for the other translation datasets.

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --mode test --debug --load_from <checkpoint>

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --trg_len_option predict --use_predicted_trg_len --load_from <checkpoint>

For adaptive decoding, add the flag --adaptive_decoding jaccard to the above.

Training New Models

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

Training the Length Prediction Model

  1. Take a checkpoint pre-trained non-autoregressive model
  2. Resume training using these in addition to the same flags used in step 1: --load_from <checkpoint> --resume --finetune_trg_len --trg_len_option predict

MS COCO dataset

  • Run pre-trained autoregressive model
python run.py --dataset mscoco --params big --load_vocab --mode test --n_layers 4 --ffw_block highway --debug --load_from mscoco_models_final/ar_model --batch_size 1024
  • Run pre-trained non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --load_vocab --mode test --n_layers 4 --fast --ffw_block highway --debug --trg_len_option predict --use_predicted_trg_len --load_from mscoco_models_final/nar_model --batch_size 1024
  • Train new autoregressive model
python run.py --dataset mscoco --params big --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4
  • Train new non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4 --fast --use_distillation --ffw_block highway --denoising_prob 0.5 --layerwise_denoising_weight --load_encoder_from mscoco_models_final/ar_model

After training it, train the length predictor (set correct path in load_from argument)

python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --mode train --n_layers 4 --fast --ffw_block highway --eval_every 1000 --drop_ratio 0.5 --drop_len_pred 0.0 --lr_schedule anneal --anneal_steps 100000 --use_distillation --load_from mscoco_models/new_nar_model --trg_len_option predict --finetune_trg_len --max_offset 20

Citation

If you find the resources in this repository useful, please consider citing:

@article{Lee:18,
  author    = {Jason Lee and Elman Mansimov and Kyunghyun Cho},
  title     = {Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement},
  year      = {2018},
  journal   = {arXiv preprint arXiv:1802.06901},
}