• Stars
    star
    189
  • Rank 204,649 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created 9 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

[Paper] [Data] [Model] Language

This work proposes a generative paradigm for translation tasks that leverages LLMs to generate higher-quality translation results based on the N-best hypotheses decoded from foundation model (e.g., SeamlessM4T-Large-V2). We also release a HypoTranslate dataset to support LLM finetuning, which contains over 592K pairs of N-best hypotheses and ground-truth translation in 11 languages. Experiments show that our GenTranslate significantly outperforms the state-of-the-art SeamlessM4T-Large-V2 on various speech and machine translation benchmarks.

TIP: At this time (before publication), we provide inference script, test data and partial well-trained models only for inference use. Full-version resources of this paper, including training script, the entire HypoTranslate dataset and all the models, will be open sourced upon publication to benefit the community.

Conda Environment Configuration

Our code is built based on lit-gpt, please refer to official tutorial to build the conda environment. Then, please install the required packages using following command:

pip install -r requirements.txt

Code

  • Model code: lit_gpt/gentrans.py;
  • Inference script: infer.sh;

Models

  • For LLMs, please refer to tutorial for configuration steps, which support many mainstream LLMs like LLaMA-2;
  • For well-trained adapter checkpoints, please refer to our HuggingFace repo.

Dataset

We have released our HypoTranslate dataset at HuggingFace.

Inference Usage

We provide two well-trained models and corresponding test sets for inference use, i.e., FLEURS Fr-En and En-Fr ST tasks. Before running inference, please follow the steps below for preparation:

  1. Go to infer.sh:
    • Specify you conda environment <your-conda-env>;
    • Specify the source-target language pair, where we provide two example pairs fr-en and en-fr;
    • Specify the LLM size: 7b for fr-en, 13b for en-fr;
  2. Download and convert LLaMA-2 pre-trained checkpoint:
  3. Go to inference/gentrans.py:
    • Specify the experiment directory exp_dir: the root path of this README.md file;
    • Specify the data directory data_dir: the absolute path of test data (.pt file);
    • Specify the LLM directory llm_dir: the absolute path of your downloaded LLaMA-2 checkpoint;
    • Specify the adapter directory adapter_dir: the absolute path of our released adapter checkpoint;

Now you can run inference on your specified language direction by:

bash infer.sh

You will see the BLEU results of GenTranslate on your specified test set.

References

@article{hu2024gentranslate,
  title={GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators},
  author={Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong},
  journal={arXiv preprint arXiv:2402.06894},
  year={2024}
}

More Repositories

1

STAR-Adapt

Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Python
241
star
2

RobustGER

Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
Python
113
star
3

NASE

Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
Python
80
star
4

DPSL-ASR

Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
Python
35
star
5

Unified-Enhance-Separation

Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
Python
34
star
6

GILA

Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
Python
17
star
7

MIR-GAN

Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
Python
14
star
8

Gradient-Remedy

Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
Python
13
star
9

UniVPM

Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
Python
12
star
10

RATS-Channel-A-Speech-Data

This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log-Mel Fbank features and several raw wavform listening samples.
11
star
11

UNA-GAN

Code for paper "Unsupervised Noise adaptation using Data Simulation"
Python
6
star
12

UNA-GAN-Demo

HTML
2
star