NiuTrans.NMT

NiuTrans.NMT

Features

NiuTrans.NMT is a lightweight and efficient Transformer-based neural machine translation system. 中文介绍

Its main features are:

Few dependencies. It is implemented with pure C++, and all dependencies are optional.
High efficiency. It is heavily optimized for fast decoding, see our WMT paper for more details.
Flexible running modes. The system can run with various systems and devices (Linux vs. Windows, CPUs vs. GPUs, and FP32 vs. FP16, etc.).
Framework agnostic. It supports various models trained with other tools, e.g., fairseq models.

Recent Updates

November 2021: Released the code of our submissions to the WMT21 efficiency task. We speed up the inference by 3 times on the GPU (up to 250k words/s on a single NVIDIA A100 GPU card)!

December 2020: Added support for the training of DLCL and RPR Attention

December 2020: Heavily reduced the memory footprint of training by optimizing the backward functions

Installation

Requirements

OS: Linux or Windows
GCC/G++ >=4.8.5 (on Linux)
VC++ >=2015 (on Windows)
cmake >= 3.5
CUDA >= 10.2 (optional)
MKL latest version (optional)
OpenBLAS latest version (optional)

Build from Source

Configure with cmake

The default configuration enables compiling for the pure CPU version.

# Download the code
git clone https://github.com/NiuTrans/NiuTrans.NMT.git
git clone https://github.com/NiuTrans/NiuTensor.git
# Merge with NiuTrans.Tensor
mv NiuTensor/source NiuTrans.NMT/source/niutensor
rm NiuTrans.NMT/source/niutensor/Main.cpp
rm -rf NiuTrans.NMT/source/niutensor/sample NiuTrans.NMT/source/niutensor/tensor/test
mkdir NiuTrans.NMT/build && cd NiuTrans.NMT/build
# Run cmake
cmake ..

You can add compilation options to the cmake command to support accelerations with MKL, OpenBLAS, or CUDA.

Please note that you can only select at most one of MKL or OpenBLAS.

Use CUDA (required for training)

Add -DUSE_CUDA=ON, -DCUDA_TOOLKIT_ROOT=$CUDA_PATH and DGPU_ARCH=$GPU_ARCH to the cmake command, where $CUDA_PATH is the path of the CUDA toolkit and $GPU_ARCH is the GPU architecture.

Supported GPU architectures are listed as below: K：Kepler M：Maxwell P：Pascal V：Volta T：Turing A：Ampere

See the NVIDIA's official page for more details.

You can also add -DUSE_HALF_PRECISION=ON to the cmake command to get half-precision supported.
Use MKL (optional)

Add -DUSE_MKL=ON and -DINTEL_ROOT=$MKL_PATH to the cmake command, where $MKL_PATH is the path of MKL.
Use OpenBLAS (optional)

Add -DUSE_OPENBLAS=ON and -DOPENBLAS_ROOT=$OPENBLAS_PATH to the cmake command, where $OPENBLAS_PATH is the path of OpenBLAS.

Note that half-precision requires Pascal or newer GPU architectures.

Configuration Example

We provide several examples to build the project with different options.

Compile on Linux

make -j && cd ..

Compile on Windows

Add -A 64 to the cmake command and it will generate a visual studio project on windows, i.e., NiuTrans.NMT.sln so you can open & build it with Visual Studio (>= Visual Studio 2015).

If it succeeds, you will get an executable file NiuTrans.NMT in the 'bin' directory.

Usage

Training

Commands

Make sure compiling the program with CUDA because training on CPUs is not supported now.

Step 1: Prepare the training data.

# Convert the BPE vocabulary
python3 tools/GetVocab.py \
  -raw $bpeVocab \
  -new $niutransVocab

Description:

raw - Path of the BPE vocabulary.
new - Path of the NiuTrans.NMT vocabulary to be saved.

# Binarize the training data
python3 tools/PrepareParallelData.py \ 
  -src $srcFile \
  -tgt $tgtFile \
  -sv $srcVocab \
  -tv $tgtVocab \
  -maxsrc 200 \
  -maxtgt 200 \
  -output $trainingFile

Description:

src - Path of the source language data. One sentence per line with tokens separated by spaces or tabs.
tgt - Path of the target language data. The same format as the source language data.
sv - Path of the source language vocabulary. Its first line is the vocabulary size and the first index, followed by a word and its index in each following line.
tv - Path of the target language vocabulary. The same format as the source language vocabulary.
maxsrc - The maximum length of a source sentence. Default: 200.
maxtgt - The maximum length of a target sentence. Default: 200.
output - Path of the training data to be saved.

Step 2: Train the model

bin/NiuTrans.NMT \
  -dev 0 \
  -nepoch 50 \
  -model model.bin \
  -ncheckpoint 10 \
  -train train.data \
  -valid valid.data

Description:

dev - Device id (>= 0 for GPUs). Default: 0.
model - Path of the model to be saved.
train - Path to the training file. The same format as the output file in step 1.
valid - Path to the validation file. The same format as the output file in step 1.
wbatch - Word batch size. Default: 4096.
sbatch - Sentence batch size. Default: 32.
dropout - Dropout rate for the model. Default: 0.3.
fnndrop - Dropout rate for fnn layers. Default: 0.1.
attdrop - Dropout rate for attention layers. Default: 0.1.
lrate- Learning rate. Default: 0.0015.
minlr - The minimum learning rate for training. Default: 1e-9.
warmupinitlr - The initial learning rate for warm-up. Default: 1e-7.
weightdecay - The weight decay factor. Default: 0.
nwarmup - Step number of warm-up for training. Default: 8000.
adam - Indicates whether Adam is used. Default: true.
adambeta1 - Hyper parameters of Adam. Default: 0.9.
adambeta2 - Hyper parameters of Adam. Default: 0.98.
adambeta - Hyper parameters of Adam. Default: 1e-9.
labelsmoothing - Label smoothing factor. Default: 0.1.
updatefreq - Update the model every updatefreq step. Default: 1.
nepoch - The maximum training epoch. Default: 50.
nstep - The maximum traing step. Default: 100000.
ncheckpoint - The maximum checkpoint to be saved. Default: 0.1.

Training Example

Refer to this page for the training example.

Translating

Make sure compiling the program with CUDA and FP16 if you want to translate with FP16 on GPUs.

Commands

bin/NiuTrans.NMT \
 -dev $deviceID \
 -input $inputFile \
 -model $modelPath \
 -wbatch $wordBatchSize \
 -sbatch $sentenceBatchSize \
 -beamsize $beamSize \
 -srcvocab $srcVocab \
 -tgtvocab $tgtVocab \
 -output $outputFile

Description:

model - Path of the model.
sbatch - Sentence batch size. Default: 32.
dev - Device id (-1 for CPUs, and >= 0 for GPUs). Default: 0.
beamsize - Size of the beam. 1 for the greedy search.
input - Path of the input file. One sentence per line with tokens separated by spaces.
output - Path of the output file to be saved. The same format as the input file.
srcvocab - Path of the source language vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.
tgtvocab - Path of the target language vocabulary. The same format as the source language vocabulary.
fp16 (optional) - Inference with FP16. This will not work if the model is stored in FP32. Default: false.
lenalpha - The alpha parameter controls the length preference. Default: 0.6.
maxlenalpha - Scalar of the input sequence (for the max number of search steps). Default: 1.2.

An Example

Refer to this page for the translating example.

Low Precision Inference

NiuTrans.NMT supports inference with FP16 and INT8, you can convert the model to FP16 with our tools:

python3 tools/FormatConverter.py \
  -input $inputModel \
  -output $outputModel \ 
  -format $targetFormat

Description:

input - Path of the raw model file.
output - Path of the new model file.
format - Target storage format, FP16 (Default) or FP32.

Converting Models from Fairseq

The core implementation is framework agnostic, so we can easily convert models trained with other frameworks to a binary format for efficient inference.

The following frameworks and models are currently supported:

	fairseq (>=0.6.2)
Transformer (Vaswani et al. 2017)	✓
RPR attention (Shaw et al. 2018)	✓
Deep Transformer (Wang et al. 2019)	✓

Refer to this page for the details about training models with fairseq.

After training, you can convert the fairseq checkpoint and vocabulary with the following steps.

Step 1: Convert parameters of a single fairseq model

python3 tools/ModelConverter.py -i $fairseqCheckpoint -o $niutransModel

Description:

raw - Path of the fairseq checkpoint, refer to this for more details.
new - Path to save the converted model parameters. All parameters are stored in a binary format.
fp16 (optional) - Save the parameters with 16-bit data type. Default: disabled.

Step 2: Convert the vocabulary:

python3 tools/VocabConverter.py -raw $fairseqVocabPath -new $niutransVocabPath

Description:

raw - Path of the fairseq vocabulary, refer to this for more details.
new - Path to save the converted vocabulary. Its first line is the vocabulary size, followed by a word and its index in each following line.

You may need to convert both the source language vocabulary and the target language vocabulary if they are not shared.

A Model Zoo

We provide several pre-trained models to test the system. All models and runnable systems are packaged into docker files so that one can easily reproduce our result.

Refer to this page for more details.

Papers

Here are the papers related to this project:

Learning Deep Transformer Models for Machine Translation. Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao. 2019. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

The NiuTrans System for WNGT 2020 Efficiency Task. Chi Hu, Bei Li, Yinqiao Li, Ye Lin, Yanyang Li, Chenglong Wang, Tong Xiao, Jingbo Zhu. 2020. Proceedings of the Fourth Workshop on Neural Generation and Translation.

The NiuTrans System for the WMT21 Efficiency Task. Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Minyi Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu. 2020.

Team Members

This project is maintained by a joint team from NiuTrans Research and NEU NLP Lab. Current team members are

Chi Hu, Chenglong Wang, Siming Wu, Bei Li, Yinqiao Li, Ye Lin, Quan Du, Tong Xiao and Jingbo Zhu

Feel free to contact huchinlp[at]gmail.com or niutrans[at]mail.neu.edu.cn if you have any questions.

NiuTrans/NiuTrans.NMT

NiuTrans

Reviews

Repository Details

NiuTrans.NMT

Features

Recent Updates

Installation

Requirements

Build from Source

Configure with cmake

Configuration Example

Compile on Linux

Compile on Windows

Usage

Training

Commands

Training Example

Translating

Commands

An Example

Low Precision Inference

Converting Models from Fairseq

A Model Zoo

Papers

Team Members

More Repositories