THUMT: An Open Source Toolkit for Neural Machine Translation
Contents
- Introduction
- Online Demo
- Implementations
- Notable Features
- Documentation
- License
- Citation
- Development Team
- Contact
- Derivative Repositories
Introduction
Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.
THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University. The website of THUMT is: http://thumt.thunlp.org/.
Online Demo
The online demo of THUMT is available at http://translate.thumt.cn/. The languages involved include Ancient Chinese, Arabic, Chinese, English, French, German, Indonesian, Japanese, Portuguese, Russian, and Spanish.
Implementations
THUMT has currently three main implementations:
-
THUMT-PyTorch: a new implementation developed with PyTorch. It implements the Transformer model (Transformer) (Vaswani et al., 2017).
-
THUMT-TensorFlow: an implementation developed with TensorFlow. It implements the sequence-to-sequence model (Seq2Seq) (Sutskever et al., 2014), the standard attention-based model (RNNsearch) (Bahdanau et al., 2014), and the Transformer model (Transformer) (Vaswani et al., 2017).
-
THUMT-Theano: the original project developed with Theano, which is no longer updated because MLA put an end to Theano. It implements the standard attention-based model (RNNsearch) (Bahdanau et al., 2014), minimum risk training (MRT) (Shen et al., 2016) for optimizing model parameters with respect to evaluation metrics, semi-supervised training (SST) (Cheng et al., 2016) for exploiting monolingual corpora to learn bi-directional translation models, and layer-wise relevance propagation (LRP) (Ding et al., 2017) for visualizing and anlayzing RNNsearch.
The following table summarizes the features of three implementations:
Implementation | Model | Criterion | Optimizer | LRP |
---|---|---|---|---|
Theano | RNNsearch | MLE, MRT, SST | SGD, AdaDelta, Adam | RNNsearch |
TensorFlow | Seq2Seq, RNNsearch, Transformer | MLE | Adam | RNNsearch, Transformer |
PyTorch | Transformer | MLE | SGD, Adadelta, Adam | N.A. |
We recommend using THUMT-PyTorch or THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-PyTorch and THUMT-TensorFlow.
Notable Features
- Transformer (Vaswani et al., 2017)
- Multi-GPU training & decoding
- Multi-worker distributed training
- Mixed precision training & decoding
- Model ensemble & averaging
- Gradient aggregation
- TensorBoard for visualization
Documentation
The documentation of PyTorch implementation is avaiable at here.
License
The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email [email protected].
Citation
Please cite the following paper:
Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Maosong Sun, Huanbo Luan, Yang Liu. THUMT: An Open Source Toolkit for Neural Machine Translation. AMTA 2020.
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.
Development Team
Project leaders: Maosong Sun, Yang Liu, Huanbo Luan
Project members:
Theano: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng
TensorFlow: Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Zonghan Yang
PyTorch: Zhixing Tan, Gang Chen
Contact
If you have questions, suggestions and bug reports, please email [email protected].
Derivative Repositories
- UCE4BT (Improving Back-Translation with Uncertainty-based Confidence Estimation)
- L2Copy4APE (Learning to Copy for Automatic Post-Editing)
- Document-Transformer (Improving the Transformer Translation Model with Document-Level Context)
- PR4NMT (Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization)