• Stars
    star
    2,410
  • Rank 18,462 (Top 0.4 %)
  • Language
    TeX
  • License
    BSD 3-Clause "New...
  • Created over 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A machine translation reading list maintained by Tsinghua Natural Language Processing Group

Machine Translation Reading List

This is a machine translation reading list maintained by the Tsinghua Natural Language Processing Group.

The past three decades have witnessed the rapid development of machine translation, especially for data-driven approaches such as statistical machine translation (SMT) and neural machine translation (NMT). Due to the dominance of NMT at the present time, priority is given to collecting important, up-to-date NMT papers; the Edinburgh/JHU MT research survey wiki has good coverage of older papers and a brief description for each sub-topic of MT. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!

10 Must Reads

Tutorials and Surveys

Statistical Machine Translation

Word-based Models

Phrase-based Models

Syntax-based Models

Discriminative Training

System Combination

Human-centered SMT

Interactive SMT

Adaptation

Evaluation

Neural Machine Translation

Model Architecture

Attention Mechanism

Open Vocabulary

Training Objectives and Frameworks

Decoding

Low-resource Language Translation

Semi-supervised Learning

Unsupervised Learning

Pivot-based Methods

Data Augmentation Methods

Data Selection Methods

Transfer Learning

Meta Learning

Multilingual Machine Translation

Prior Knowledge Integration

Word/Phrase Constraints

Syntactic/Semantic Constraints

Coverage Constraints

Document-level Translation

Robustness

Interpretability

Linguistic Interpretation

Fairness and Diversity

Efficiency

Pre-Training

Non-Autoregressive Translation

Speech Translation and Simultaneous Translation

Multi-modality

Ensemble and Reranking

Domain Adaptation

Quality Estimation

Human-centered NMT

Interactive NMT

Automatic Post-Editing

Poetry Translation

Eco-friendly

Compositional Generalization

Endangered Language Revitalization

Word Translation

WMT Winners

WMT is the most important annual international competition on machine translation. We collect the competition results on the news translation task since WMT 2016 (the First Conference of Machine Translation) and summarize the techniques used in the systems with the top performance. Currently, we focus on four directions: ZH-EN, EN-ZH, DE-EN, and EN-DE. The summarized algorithms might be incomplete; your suggestions are welcome!

WMT 2019

WMT 2018

  • The winner of ZH-EN: Tencent

    • System report: Mingxuan Wang, Li Gong, Wenhuan Zhu, Jun Xie, and Chao Bian. 2018. Tencent Neural Machine Translation Systems for WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: RNMT + Transformer + BPE + Rerank ensemble outputs with 48 features (including t2t R2l, t2t L2R, rnn L2R, rnn R2L etc.) + Back Translation + Joint Train with English to Chinese systems + Fine-tuning with selected data + Knowledge distillation
  • The winner of EN-ZH: GTCOM

    • System report: Chao Bei, Hao Zong, Yiming Wang, Baoyong Fan, Shiqi Li, and Conghu Yuan. 2018. An Empirical Study of Machine Translation for the Shared Task of WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: Transformer + Back-Translation + Data Filtering by rules, language models and translation models + BPE + Greedy Ensemble Decoding + Fine-Tuning with newstest2017 back translation
  • The winner of DE-EN: RWTH Aachen University

    • System report: Julian Schamper, Jan Rosendahl, Parnia Bahar, Yunsu Kim, Arne Nix, and Hermann Ney. 2018. The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers.
    • Techniques: Ensemble of 3-strongest Transformer models + Data Selection + BPE + Fine-Tuning + Important Hyperparameters (batch size and model dimension)
  • The winner of EN-DE: Microsoft

WMT 2017

  • The winner of ZH-EN: Sogou

    • System report: Yuguang Wang, Shanbo Cheng, Liyang Jiang, Jiajun Yang, Wei Chen, Muze Li, Lin Shi, Yanfeng Wang, and Hongtao Yang. 2017. Sogou Neural Machine Translation Systems for WMT17. In Proceedings of the Second Conference on Machine Translation: Shared Task Papers.
    • Techniques: Encoder-Decoder with Attention + BPE + Reranking (R2L, T2S, N-gram language models) + Tagging Model + Name Entity Translation + Ensemble
  • The winner of EN-ZH, DE-EN and EN-DE: University of Edinburgh

    • System report: Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. 2017. The University of Edinburgh’s Neural MT Systems for WMT17. In Proceedings of the Second Conference on Machine Translation: Shared Task Papers.
    • Techniques: Encoder-Decoder with Attention + Deep Model + Layer Normalization + Weight Tying + Back-Translation + BPE + Reranking(L2R, R2L) + Ensemble

WMT 2016

  • The winner of DE-EN: University of Regensburg

    • System report: Failed to find it
    • Techniques: Failed to find it
  • The winner of EN-DE: University of Edinburgh

More Repositories

1

THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group
Python
691
star
2

TG-Reading-List

A text generation reading list maintained by Tsinghua Natural Language Processing Group.
TeX
444
star
3

Document-Transformer

Improving the Transformer translation model with document-level context
Python
171
star
4

dyMEAN

This repo contains the codes for our paper "End-to-End Full-Atom Antibody Design"
Python
79
star
5

MEAN

This repo contains the codes for our paper Conditional Antibody Design as 3D Equivariant Graph Translation.
Python
74
star
6

Mask-Align

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021
Python
58
star
7

THUCC

An open-source classical Chinese information processing toolkit developed by Tsinghua Natural Language Processing Group
Python
48
star
8

PS-VAE

This repo contains the codes for our paper: Molecule Generation by Principal Subgraph Mining and Assembling.
Python
29
star
9

Template-NMT

Python
22
star
10

PLM4MT

Code for our work "MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators" in ACL 2022
Python
20
star
11

UCE4BT

Python
19
star
12

MT-Toolkit-List

A list of machine translation open-source toolkits maintained by Tsinghua Natural Language Processing Group
13
star
13

PR4NMT

Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization
Python
12
star
14

L2Copy4APE

Learning to Copy for Automatic Post-Editing (EMNLP 2019)
Python
11
star
15

TRICE

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.
Python
11
star
16

DirectQuote

A Dataset for Direct Quotation Extraction and Attribution in News Articles.
11
star
17

SKR

Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)
Python
11
star
18

UBiLexAT

An Unsupervised Bilingual Lexicon Inducer From Non-Parallel Data by Adversarial Training
Python
8
star
19

PromptGating4MCTG

This is the repo for our work “An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation” (ACL 2023).
Python
8
star
20

PGRA

Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks
Python
7
star
21

DBKD-PLM

Codebase for ACL 2023 conference long paper Bridging the Gap between Decision and Logits in Decision-based Knowledge Distillation for Pre-trained Language Models.
Python
6
star
22

FIIG

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions (EMNLP 2023 Findings)
6
star
23

BiLex

A Bilingual Lexicon Inducer From Non-Parallel Data
C
5
star
24

UBiLexEMD

An Unsupervised Bilingual Lexicon Inducer From Non-Parallel Data by Earth Mover's Distance Minimization
Python
5
star
25

SelfSupervisedQE

Self-Supervised Quality Estimation for Machine Translation
Python
5
star
26

symbol2language

Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
5
star
27

TRAN

This is the repo for our work “Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation” (EMNLP 2023).
Python
5
star
28

Voting4SC

Modeling Voting for System Combination in Machine Translation (IJCAI 2020)
Python
4
star
29

ktnmt

Python
4
star
30

CODIS

Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
JavaScript
4
star
31

ModelCompose

3
star
32

MetaRanking

Official code repo for our work "Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement".
2
star
33

MT-Dataset-List

A list machine translation datasets maintained by Tsinghua Natural Language Processing Group
2
star
34

DEEM

2
star
35

ROGO

This repo contains the codes for our work “Restricted orthogonal gradient projection for continual learning”.
Python
1
star
36

Brote

Python
1
star
37

Transformer-DMB

Codes for our paper "Dynamic Multi-Branch Layers for On-Device Neural Machine Translation" in TASLP
Python
1
star
38

CKD

Continual Knowledge Distillation for Neural Machine Translation
Python
1
star