lancopku/AdaMod

Stars
126
Rank 284,543 (Top 6 %)
Language
Python
License
Apache License 2.0
Created about 5 years ago
Updated 2 months ago

lancopku/AdaMod

lancopku

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Adaptive and Momental Bounds for Adaptive Learning Rate Methods.

AdaMod

An optimizer which exerts adaptive momental upper bounds on individual learning rates to prevent them becoming undesirably lager than what the historical statistics suggest and avoid the non-convergence issue, thus to a better performance. Strong empirical results on many deep learning applications demonstrate the effectiveness of our proposed method especially on complex networks such as DenseNet and Transformer.

Based on Ding et al. (2019). An Adaptive and Momental Bound Method for Stochastic Learning.

Installation

AdaMod requires Python 3.6.0 or later.

Installing via pip

The preferred way to install AdaMod is via pip with a virtual environment. Just run

pip install adamod

in your Python environment and you are ready to go!

Using source code

As AdaMod is a Python class with only 100+ lines, an alternative way is directly downloading adamod.py and copying it to your project.

Usage

You can use AdaMod just like any other PyTorch optimizers.

optimizer = adamod.AdaMod(model.parameters(), lr=1e-3, beta3=0.999)

As described in the paper, AdaMod can smooths out unexpected large learning rates throughout the training process. The beta3 parameter is the smoothing coefficient for actual learning rate, which controls the average range. In common cases, a beta3 in {0.999,0.9999} can achieve relatively good and stable results. See the paper for more details.

Citation

If you use AdaMod in your research, please cite An Adaptive and Momental Bound Method for Stochastic Learning. Thanks!

@article{ding2019adaptive,
  title={An Adaptive and Momental Bound Method for Stochastic Learning},
  author={Jianbang Ding and Xuancheng Ren and Ruixuan Luo and Xu Sun},
  journal={arXiv preprint arXiv:1910.12249},
  year={2019}
}

Demos

For the full list of demos, please refer to this page.

Contributors

pkuseg-python

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

SGM

Sequence Generation Model for Multi-label Classification (COLING 2018)

Chinese-Literature-NER-RE-Dataset

A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text

Global-Encoding

Global Encoding for Abstractive Summarization (ACL 2018)

Graph-to-seq-comment-generation

Code for the paper ``Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model''

SU4MLC

Code for the article "Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification" (EMNLP 2018)

label-words-are-anchors

Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

DPGAN

Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text (EMNLP2018)

superAE

Code for "Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization"

text-autoaugment

[EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

livebot

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts (AAAI 2019)

meProp

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)

Unpaired-Sentiment-Translation

Code for "Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach" (ACL 2018)

WEAN

Code for "Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation" (NAACL 2018)

label-embedding-network

Label Embedding Network

Prime

A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.

AAPR

Automatic Academic Paper Rating: Data and Model (ACL 2018)

Skeleton-Based-Generation-Model

Code for "A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation" (EMNLP 2018)

Explicit-Sparse-Transformer

code for Explicit Sparse Transformer

SMAE

This is the code for "Learning Sentiment Memories for Sentiment Modification without Parallel Data".

LancoSum

A toolkit for abstractive summarization, which is easy to implement the baseline and our proposed models, which can achieve the SOTA performance.

Seq2Set

Code for the paper "A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification"

AMM

The code for "An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation" （EMNLP 2018)

AdaNorm

Code for "Understanding and Improving Layer Normalization"

bag-of-words

Code for "Bag-of-Words as Target for Neural Machine Translation"

well-classified-examples-are-underestimated

Code for the AAAI 2022 publication "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

Jupyter Notebook

SRB

Code for "Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization"

DynamicKD

Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"

agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]

simNet

Code for "simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions" （EMNLP 2018）

Embedding-Poisoning

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)

IAIS

[ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval

codable-watermarking-for-llm

Repository for Towards Codable Watermarking for Large Language Models

Chinese-Dependency-Treebank-with-Ellipsis

An Ellipsis-aware Chinese Dependency Treebank for Web Text

DeconvDec

Code for "Deconvolution-Based Global Decoding for Neural Machine Translation" (COLING 2018).

tcm_prescription_generation

Code for "Exploration on Generating Traditional Chinese Medicine Prescriptions from Symptoms with an End-to-End Approach"

CGM

Code for IJCAI 2021 main conference paper "Long-term, Short-term and Sudden Event: Trading Volume Movement Prediction with Graph-based Multi-view Modeling"

HSSC

Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)

RAP

Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)

clip-openness

[ACL 2023] Delving into the Openness of CLIP

SOS

Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)

Jupyter Notebook

CMAC

The dataset and code for the paper "Cross-Modal Commentator: Automatic Machine Commenting Based on Cross-Modal Information"

MUKI

[Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models

ChineseNER

Code for "Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media: A Unified Model"

meSimp

Codes for "Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method"

Avg-Avg

[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Pivot

Code for "Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation" (ACL 2019)

LexicalAT

Codes for paper "LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification"

RMSC

Data and code for paper "Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations"

Decode-CRF

Conditional Random Fields with Decode-based Learning

nndep

Transition-based Dependency Parser with neural networks and hybrid oracle

SACT

Code for the article "Automatic Temperature Control for Neural Machine Translation" (EMNLP 2018)

SAPO

C# code for "Towards Easier and Faster Sequence Labeling for Natural Language Processing: A Search-based Probabilistic Online Learning Framework (SAPO)" (Information Sciences)

Augmented_Data_for_FST

The augmented data of the paper "Parallel Data Augmentation for Formality Style Transfer" (ACL 2020).

ACA4NMT

Code of a novel model for NMT

DCKD

Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)

CascadeBERT

Code for CascadeBERT, Findings of EMNLP 2021

FedMNMT

[Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter

Multi-Order-LSTM

Code for "Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?"

SemPre

Towards Semantics-Enhanced Pre-Training: Can Lexicon Definitions Help Learning Sentence Meanings? (AAAI 2021)

DAN

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Early-Exit

Code for the paper: A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models.

CVST

Code for paper "Knowledgeable Storyteller: A Commonsense-Driven Generative Model for Visual Storytelling"

Multi-Task-Learning

Online Multi-Task Learning Toolkit based on C#; code for "Large-Scale Personalized Human Activity Recognition using Online Multi-Task Learning" (TKDE)

NLP_Code_Index

codes and papers from @lancopku

GKD

CRF-ADF

CRF Toolkit based on C#; support ADF (Adaptive stochastic gradient Decent based on Feature-frequency information, ACL 2012)

Sememe_prediction

Code for paper "Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions"

LPVDN

Python code for paper - Learning Robust Representation for Clustering through Locality Preserving Variational Discriminative Network

Attention-Augmentation

GNOME

Code of the EACL 2023 Paper: Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features

MR-VPC

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality