• Stars
    star
    184
  • Rank 209,187 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).

Revisiting Few-sample BERT Fine-tuning

made-with-python License: MIT

Paper Link

Authors:

*: Equal Contribution

Overview

In this paper, we study the problem of few-sample BERT fine-tuning and identify three sub-optimal practices. First, we observe that the omission of the gradient bias correction in the BERTAdam makes fine-tuning unstable. We also find that the top layers of BERT provide a detrimental initialization and simply re-initializing these layers improves convergence and performance. Finally, we observe that commonly used recipes often do not allocate sufficient time for training.

If you find this repo useful, please cite:

@article{revisit-bert-finetuning,
  title={Revisiting Few-sample BERT Fine-tuning},
  author={Zhang, Tianyi and Wu, Felix and Katiyar, Arzoo and Weinberger, Kilian Q. and Artzi, Yoav.},
  journal={arXiv preprint arXiv:2006.05987},
  year={2019}
}

Requirements

torch==1.4.0
transformers==2.8.0
apex==0.1
tqdm
tensorboardX

Please install apex following the instructions at https://github.com/NVIDIA/apex.

Usage

We provide the following sample scripts. When using these scripts, please change --data_dir, --output_dir and --cache_dir to the your path to data folder, output folder, and transformers cache directory.

  1. To train BERT baseline (with debiased Adam):
bash sample_commands/debiased_adam_baseline.sh
  1. To use Re-init:
bash sample_commands/reinit.sh
  1. To train the model with more iterations
bash sample_commands/debiased_adam_longer.sh
  1. To use mixout:
bash sample_commands/mixout.sh
  1. To use layer-wise learning rate decay:
bash sample_commands/llrd.sh
  1. To use pretrained weight decay:
bash sample_commands/pretrained_wd.sh 

Input

You need to download GLUE dataset by this script. Feed the path to your data through --data_dir.

Commands

We provide example commands to replicate our experiments in sample_commands.

run_glue.py contains the main program to fine-tuning and evaluate models. python run_glue.py --help shows all available options.

Some key options are:

# These two replicate our experiments of bias cortrection
--use_bertadam        No bias correction # this replicates the behavior of BERTAdam
--use_torch_adamw     Use pytorch adamw # this replicates the behavior of debiased Adam 
# These two two replicate our experiments of Re-init
--reinit_pooler       reinitialize the pooler
--reinit_layers       re-initialize the last N Transformer blocks. reinit_pooler must be turned on.

Output

A standard output folder generated by run_glue.py will look like:

├── raw_log.txt
├── test_best_log.txt
├── test_last_log.txt
└── training_args.bin

*_log.txt are csv files that record relevant training and evaluate results. test_best_log.txt records the test performance with the best model checkpoint during training. test_last_log.txt records that with the last model checkpoint. training_args.bin contains all arguments used to run a job.

More Repositories

1

sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)
Python
2,103
star
2

flambe

An ML framework to accelerate research and its path to production.
Python
262
star
3

wav2seq

Official code for Wav2Seq
Python
81
star
4

aum

Python
71
star
5

sew

Python
69
star
6

dialog-intent-induction

Code and data for paper "Dialog Intent Induction with Deep Multi-View Clustering", Hugh Perkins and Yi Yang, 2019, EMNLP 2019
Python
67
star
7

abcd

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"
Python
57
star
8

structshot

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning
Python
53
star
9

slue-toolkit

A toolkit for Spoken Language Understanding Evaluation (SLUE) benchmark. Refer paper https://arxiv.org/abs/2111.10367 for more details. Official website: https://asappresearch.github.io/slue-toolkit/
Python
47
star
10

rationale-alignment

Python
46
star
11

flop

Pytorch library for factorized L0-based pruning.
Python
39
star
12

multistream-cnn

Multistream CNN for Robust Acoustic Modeling
Shell
38
star
13

clip

Python
22
star
14

emergent-comms-negotiation

Reproduce ICLR2018 submission "Emergent Communication through Negotiation"
Python
16
star
15

interactive-classification

Python
13
star
16

dynamic-classification

Code from the paper: "Metric Learning for Dynamic Text Classification"
Python
11
star
17

kbc-pomr

Code for the paper "Knowledge Base Completion for Constructing Problem-Oriented Medical Records" at MLHC 2020
Python
10
star
18

gold

Official repository for "GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation"
Python
8
star
19

spoken-ner

Python
8
star
20

constrained-dialogue-generation

Python
5
star
21

texrel

Code for paper "TexRel: a Green Family of Datasets for Emergent Communications on Relations", Hugh Perkins, 2021 (https://arxiv.org/abs/2105.12804)
Python
3
star
22

compositional-inductive-bias

Code for paper "A Framework for Measuring Compositional Inductive Bias", Hugh Perkins, 2021 (https://arxiv.org/abs/2103.04180)
Python
2
star
23

imitkd

Python
1
star