Ladder Side-Tuning (LST)

Authors: Yi-Lin Sung, Jaemin Cho, Mohit Bansal
Paper: "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"

We propose Ladder Side-Tuning (LST), a new parameter-efficient transfer learning (PETL) technique that reduces training memory requirements by more substantial amounts. Unlike existing PETL methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (ladders) from backbone networks and makes predictions.

On both GLUE and VL tasks, LST saves 2.7x more memory than other PETL methods. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models (T5-large, T5-3B), attaining better GLUE performance than full fine-tuning and other PETL methods.

Run GLUE Experiments

Installation

cd Ladder-Side-Tuning
pip install -e .
# Install pacakge for network pruning
cd Torch-Pruning
pip install -e .

pip install torch==1.10.2+cu111 torchvision==0.11.3+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Commands

# All commands are run under seq2seq/
cd seq2seq

## [Table 1]
# Run experiments for 3 seeds, where [Script.sh] is one of "scripts/baseline.sh", "scripts/adapters.sh", "scripts/lora.sh", "scripts/bitfit.sh", "scripts/prompt-tuning.sh", "scripts/ladder_side_tuning_base.sh", "scripts/ladder_side_tuning_large.sh" and "scripts/ladder_side_tuning_3b.sh"
# [GPU ID] = "0", "0,1", "2,3,4", etc

for task in "cola" "mrpc" "qnli" "rte" "sst2" "mnli" "qqp" "stsb"
do
    bash [Script.sh] [GPU ID] $task
done

## [Table 2]
# Run experiments (except stsb) for 3 seeds.

for task in "cola" "mrpc" "qnli" "rte" "sst2" "mnli" "qqp" 
do
    bash scripts/ladder_side_tuning_bart.sh [GPU ID] $task
done

## [Figure 5]
# Change the task_reduction_factor in scripts/adapters.sh, lora_dim in scripts/lora.sh, r in scripts/ladder_side_tuning_base.sh to the target value, and run the codes with updated bash files.

for task in "cola" "mrpc" "qnli" "rte" "sst2" "mnli" "qqp" "stsb"
do
    bash [Script.sh] [GPU ID] $task
done

## [Figure 6]
# Run experiments for 3 seeds, where [Script.sh] is one of "scripts/adapters_freezing.sh", "scripts/lora_freezing.sh", "scripts/ladder_side_tuning_base_dropping.sh"

for task in "cola" "mrpc" "qnli" "rte" "sst2" "mnli" "qqp" "stsb"
do
    bash [Script.sh] [GPU ID] $task
done

Run VL Experiments

Installation

The environment for VL experiments conflicts with that used for GLUE experiments on transformers package (version 4.2 vs. 4.6). The other packages should be fine, so you can install all packages and overwrite the transformers on the environment you used before by the following commands. However, the easiest way to avoid issues is to create a stand-alone environment.

# Create python environment (optional)
conda create -n vlt5 python=3.8
source activate vlt5

# Install python dependencies
pip install -r requirements.txt

# Download T5/BART backbone checkpoint
python download_backbones.py

# For MSCOCO captioning evaluation (optional; for captioning only)
python -c "import language_evaluation; language_evaluation.download('coco')"

# Install pacakge for network pruning
cd Torch-Pruning
pip install -e .

pip install torch==1.10.2+cu111 torchvision==0.11.3+cu111 -f https://download.pytorch.org/whl/torch_stable.html

Data Preparation

Follow VL_adapter's README to download data from the link, and put the data folder datasets under ./Ladder-Side-Tuning.

Commands

# All commands are run under VL-T5/
cd VL-T5

## [Table 3]
# Run experiments for 3 seeds, where [Script.sh] is one of "scripts/image/full_finetuning.sh", "scripts/image/adapter.sh", "scripts/image/lora.sh", "scripts/image/bitfit.sh", "scripts/image/prompt_tuning.sh", "scripts/image/LST.sh"
# [GPU ID] = "0", "0,1", "2,3,4", etc
# [Number of GPU you want to use] = 1, 2, 4, etc

bash scripts/image/LST.sh 0 1
bash [Script.sh] [GPU ID] [Number of GPU you want to use]

Plan to Release Codes

T5 + LST on GLUE experiments
Bart encoder + LST on GLUE experiments
CLIP-T5 on vision-and-language tasks

Acknowledgement

This repo borrows codes from Compacter, CLIP-ViL, VL-Adapter, and Torch-Pruning.

Reference

Please consider to cite our paper if use this repo in your work.

@inproceedings{Sung2022LST,
  title     = {LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning},
  author    = {Yi-Lin Sung, Jaemin Cho, Mohit Bansal},
  booktitle = {arXiv},
  year      = {2022}
}

ylsung/Ladder-Side-Tuning

ylsung

Reviews

Repository Details

Ladder Side-Tuning (LST)

Run GLUE Experiments

Installation

Commands

Run VL Experiments

Installation

Data Preparation

Commands

Plan to Release Codes

Acknowledgement

Reference

More Repositories