• Stars
    star
    2,103
  • Rank 21,948 (Top 0.5 %)
  • Language
    Python
  • License
    MIT License
  • Created about 7 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

News

SRU++, a new SRU variant, is released. [tech report] [blog]

The experimental code and SRU++ implementation are available on the dev branch which will be merged into master later.

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.


Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence [paper]

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute [paper]

@article{lei2021srupp,
  title={When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute},
  author={Tao Lei},
  journal={arXiv preprint arXiv:2102.12459},
  year={2021}
}

Requirements

Install requirements via pip install -r requirements.txt.


Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = torch.FloatTensor(20, 32, 128).cuda()

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = -2,        # initial bias of highway gate (<= 0)
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributing

Please read and follow the guidelines.

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.


More Repositories

1

flambe

An ML framework to accelerate research and its path to production.
Python
262
star
2

revisit-bert-finetuning

For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).
Python
184
star
3

wav2seq

Official code for Wav2Seq
Python
81
star
4

aum

Python
71
star
5

sew

Python
69
star
6

dialog-intent-induction

Code and data for paper "Dialog Intent Induction with Deep Multi-View Clustering", Hugh Perkins and Yi Yang, 2019, EMNLP 2019
Python
67
star
7

abcd

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"
Python
57
star
8

structshot

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning
Python
53
star
9

slue-toolkit

A toolkit for Spoken Language Understanding Evaluation (SLUE) benchmark. Refer paper https://arxiv.org/abs/2111.10367 for more details. Official website: https://asappresearch.github.io/slue-toolkit/
Python
47
star
10

rationale-alignment

Python
46
star
11

flop

Pytorch library for factorized L0-based pruning.
Python
39
star
12

multistream-cnn

Multistream CNN for Robust Acoustic Modeling
Shell
38
star
13

clip

Python
22
star
14

emergent-comms-negotiation

Reproduce ICLR2018 submission "Emergent Communication through Negotiation"
Python
16
star
15

interactive-classification

Python
13
star
16

dynamic-classification

Code from the paper: "Metric Learning for Dynamic Text Classification"
Python
11
star
17

kbc-pomr

Code for the paper "Knowledge Base Completion for Constructing Problem-Oriented Medical Records" at MLHC 2020
Python
10
star
18

gold

Official repository for "GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation"
Python
8
star
19

spoken-ner

Python
8
star
20

constrained-dialogue-generation

Python
5
star
21

texrel

Code for paper "TexRel: a Green Family of Datasets for Emergent Communications on Relations", Hugh Perkins, 2021 (https://arxiv.org/abs/2105.12804)
Python
3
star
22

compositional-inductive-bias

Code for paper "A Framework for Measuring Compositional Inductive Bias", Hugh Perkins, 2021 (https://arxiv.org/abs/2103.04180)
Python
2
star
23

imitkd

Python
1
star