• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    C#
  • Created about 7 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

meProp

The codes were used for the paper meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting (ICML 2017) [pdf] by Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang.

Based on meProp, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. We name this method meSimp (minimal effort simplification). For more details, please see the paper Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method [pdf]. The codes are at [here].

Introduction

We propose a simple yet effective technique to simplify the training of neural networks. The technique is based on the top-k selection of the gradients in back propagation.

In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. We name this method meProp (minimal effort back propagation).

Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the proposed method improves the accuracy of the resulting models rather than degrades the accuracy, and a detailed analysis is given.

The following figure is an illustration of the idea of meProp.

An illustration of the idea of meProp.

TL;DR: Training with meProp is significantly faster than the original back propagation, and has better accuracy on all of the three tasks we used, Dependency Parsing, POS Tagging and MNIST respectively. The method works with different neural models (MLP and LSTM), with different optimizers (we tested AdaGrad and Adam), with DropOut, and with more hidden layers. The top-k selection works better than the random k-selection, and better than normally-trained k-dimensional network.

Update: Results on test set (please refer to the paper for detailed results and experimental settings):

Method (Adam, CPU) Backprop Time (s) Test (%)
Parsing (MLP 500d) 9,078 89.80
Parsing (meProp top-20) 489 (18.6x) 88.94 (+0.04)
POS-Tag (LSTM 500d) 16,167 97.22
POS-Tag (meProp top-10) 436 (37.1x) 97.25 (+0.03)
MNIST (MLP 500d) 170 98.20
MNIST (meProp top-80) 29 (5.9x) 98.27 (+0.07)

The effect of k, selection (top-k vs. random), and network dimension (top-k vs. k-dimensional):

Effect of k

To achieve speedups on GPUs, a slight change is made to unify the top-k pattern across the mini-batch. The original meProp will cause different top-k patterns across examples of a mini-batch, which will require sparse matrix multiplication. However, sparse matrix multiplication is not very efficient on GPUs compared to dense matrix multiplication on GPUs. Hence, by unifying the top-k pattern, we can extract the parts of the matrices that need computation (dense matrices), get the results, and reconstruct them to the appropriate size for further computation. This leads to actual speedups on GPUs, although we believe if a better method is designed, the speedups on GPUs can be better.

See [pdf] for more details, experimental results, and analysis.

Usage

PyTorch

Requirements

  • Python 3.5
  • PyTorch v0.1.12+ - v0.3.1
  • torchvision
  • CUDA 8.0

Dataset

MNIST: The code will automatically download the dataset and process the dataset (using torchvision). See function get_mnist in the pytorch code for more information.

Run

python3.5 main.py

The code runs unified meProp by default. You could change the lines at the bottom of the main.py to run meProp using sparse matrix multiplication. Or you could pass the arguments through command line.

usage: main.py [-h] [--n_epoch N_EPOCH] [--d_hidden D_HIDDEN]
               [--n_layer N_LAYER] [--d_minibatch D_MINIBATCH]
               [--dropout DROPOUT] [--k K] [--unified] [--no-unified]
               [--random_seed RANDOM_SEED]

optional arguments:
  -h, --help            show this help message and exit
  --n_epoch N_EPOCH     number of training epochs
  --d_hidden D_HIDDEN   dimension of hidden layers
  --n_layer N_LAYER     number of layers, including the output layer
  --d_minibatch D_MINIBATCH
                        size of minibatches
  --dropout DROPOUT     dropout rate
  --k K                 k in meProp (if invalid, e.g. 0, do not use meProp)
  --unified             use unified meProp
  --no-unified          do not use unified meProp
  --random_seed RANDOM_SEED
                        random seed

The results will be written to stdout by default, but you could change the argument file when initializing the TestGroup to write the results to a file.

The code supports simple unified meProp in addition. Please notice, this code will use GPU 0 by default.

C#

Requirements

  • Targeting Microsoft .NET Framework 4.6.1+
  • Compatible versions of Mono should work fine (tested Mono 5.0.1)
  • Developed with Microsoft Visual Studio 2017

Dataset

MNIST: Download from link. Extract the files, and place them at the same location with the executable.

Run

Compile the code first, or use the executable provided in releases.

Then

nnmnist.exe <config.json>

or

mono nnmnist.exe <config.json>

where <config.json> is a configuration file. There is an example configuration file in the source codes. The example configuration file runs the baseline model. Change the NetType to mlptop for experimenting with meProp, and to mlpvar for experimenting with meSimp. The output will be written to a file at the same location with the executable.

The code supports random k selection in addition.

Citation

bibtex:

@InProceedings{sun17meprop,
  title = 	 {me{P}rop: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting},
  author = 	 {Xu Sun and Xuancheng Ren and Shuming Ma and Houfeng Wang},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {3299--3308},
  year = 	 {2017},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {International Convention Centre, Sydney, Australia}
}

More Repositories

1

pkuseg-python

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Python
6,526
star
2

SGM

Sequence Generation Model for Multi-label Classification (COLING 2018)
Python
432
star
3

Chinese-Literature-NER-RE-Dataset

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
406
star
4

Global-Encoding

Global Encoding for Abstractive Summarization (ACL 2018)
Python
275
star
5

Graph-to-seq-comment-generation

Code for the paper ``Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model''
Python
175
star
6

SU4MLC

Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018)
Python
154
star
7

label-words-are-anchors

Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Python
148
star
8

DPGAN

Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text (EMNLP2018)
Python
146
star
9

superAE

Code for "Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization"
Python
137
star
10

AdaMod

Adaptive and Momental Bounds for Adaptive Learning Rate Methods.
Python
126
star
11

text-autoaugment

[EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
Python
125
star
12

livebot

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts (AAAI 2019)
Python
122
star
13

Unpaired-Sentiment-Translation

Code for "Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach" (ACL 2018)
Python
109
star
14

WEAN

Code for "Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation" (NAACL 2018)
Python
93
star
15

label-embedding-network

Label Embedding Network
Python
90
star
16

Prime

A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
Python
86
star
17

AAPR

Automatic Academic Paper Rating: Data and Model (ACL 2018)
Python
72
star
18

Skeleton-Based-Generation-Model

Code for "A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation" (EMNLP 2018)
Python
65
star
19

Explicit-Sparse-Transformer

code for Explicit Sparse Transformer
Python
57
star
20

SMAE

This is the code for "Learning Sentiment Memories for Sentiment Modification without Parallel Data".
Python
55
star
21

LancoSum

A toolkit for abstractive summarization, which is easy to implement the baseline and our proposed models, which can achieve the SOTA performance.
Python
50
star
22

Seq2Set

Code for the paper "A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification"
Python
50
star
23

AMM

The code for "An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation" (EMNLP 2018)
Python
48
star
24

AdaNorm

Code for "Understanding and Improving Layer Normalization"
Python
45
star
25

bag-of-words

Code for "Bag-of-Words as Target for Neural Machine Translation"
Python
45
star
26

well-classified-examples-are-underestimated

Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"
Jupyter Notebook
42
star
27

SRB

Code for "Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization"
Python
41
star
28

DynamicKD

Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"
Python
40
star
29

agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
Python
39
star
30

simNet

Code for "simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions" (EMNLP 2018)
Python
37
star
31

Embedding-Poisoning

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)
Python
36
star
32

IAIS

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval
Python
30
star
33

codable-watermarking-for-llm

Repository for Towards Codable Watermarking for Large Language Models
Python
27
star
34

Chinese-Dependency-Treebank-with-Ellipsis

An Ellipsis-aware Chinese Dependency Treebank for Web Text
Python
26
star
35

DeconvDec

Code for "Deconvolution-Based Global Decoding for Neural Machine Translation" (COLING 2018).
Python
26
star
36

tcm_prescription_generation

Code for "Exploration on Generating Traditional Chinese Medicine Prescriptions from Symptoms with an End-to-End Approach"
Python
26
star
37

CGM

Code for IJCAI 2021 main conference paper "Long-term, Short-term and Sudden Event: Trading Volume Movement Prediction with Graph-based Multi-view Modeling"
Python
23
star
38

HSSC

Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)
Python
23
star
39

RAP

Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)
Python
22
star
40

clip-openness

[ACL 2023] Delving into the Openness of CLIP
Python
22
star
41

SOS

Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
Jupyter Notebook
21
star
42

CMAC

The dataset and code for the paper "Cross-Modal Commentator: Automatic Machine Commenting Based on Cross-Modal Information"
Python
20
star
43

MUKI

[Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Python
19
star
44

ChineseNER

Code for "Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media: A Unified Model"
Python
18
star
45

meSimp

Codes for "Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method"
C#
18
star
46

Avg-Avg

[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection
Python
18
star
47

Pivot

Code for "Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation" (ACL 2019)
Python
17
star
48

LexicalAT

Codes for paper "LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification"
Python
16
star
49

RMSC

Data and code for paper "Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations"
Python
14
star
50

Decode-CRF

Conditional Random Fields with Decode-based Learning
C#
14
star
51

nndep

Transition-based Dependency Parser with neural networks and hybrid oracle
C#
13
star
52

SACT

Code for the article "Automatic Temperature Control for Neural Machine Translation" (EMNLP 2018)
Python
13
star
53

SAPO

C# code for "Towards Easier and Faster Sequence Labeling for Natural Language Processing: A Search-based Probabilistic Online Learning Framework (SAPO)" (Information Sciences)
C#
13
star
54

Augmented_Data_for_FST

The augmented data of the paper "Parallel Data Augmentation for Formality Style Transfer" (ACL 2020).
12
star
55

ACA4NMT

Code of a novel model for NMT
Python
11
star
56

DCKD

Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)
Python
11
star
57

CascadeBERT

Code for CascadeBERT, Findings of EMNLP 2021
Python
11
star
58

FedMNMT

[Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter
Python
10
star
59

Multi-Order-LSTM

Code for "Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?"
Python
9
star
60

SemPre

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings? (AAAI 2021)
Python
9
star
61

DAN

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
Python
9
star
62

Early-Exit

Code for the paper: A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models.
Python
8
star
63

CVST

Code for paper "Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling"
7
star
64

Multi-Task-Learning

Online Multi-Task Learning Toolkit based on C#; code for "Large-Scale Personalized Human Activity Recognition using Online Multi-Task Learning" (TKDE)
C#
6
star
65

NLP_Code_Index

codes and papers from @lancopku
5
star
66

GKD

Python
5
star
67

CRF-ADF

CRF Toolkit based on C#; support ADF (Adaptive stochastic gradient Decent based on Feature-frequency information, ACL 2012)
C#
4
star
68

Sememe_prediction

Code for paper "Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions"
Python
3
star
69

LPVDN

Python code for paper - Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network
Python
3
star
70

Attention-Augmentation

Python
2
star
71

GNOME

Code of the EACL 2023 Paper: Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features
1
star
72

MR-VPC

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
1
star