• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Transformer based translation quality estimation

License Downloads

TransQuest: Translation Quality Estimation with Cross-lingual Transformers

The goal of quality estimation (QE) is to evaluate the quality of a translation without having access to a reference translation. High-accuracy QE that can be easily deployed for a number of language pairs is the missing piece in many commercial translation workflows as they have numerous potential uses. They can be employed to select the best translation when several translation engines are available or can inform the end user about the reliability of automatically translated content. In addition, QE systems can be used to decide whether a translation can be published as it is in a given context, or whether it requires human post-editing before publishing or translation from scratch by a human. The quality estimation can be done at different levels: document level, sentence level and word level.

With TransQuest, we have opensourced our research in translation quality estimation which also won the sentence-level direct assessment quality estimation shared task in WMT 2020. TransQuest outperforms current open-source quality estimation frameworks such as OpenKiwi and DeepQuest.

Features

  • Sentence-level translation quality estimation on both aspects: predicting post editing efforts and direct assessment.
  • Word-level translation quality estimation capable of predicting quality of source words, target words and target gaps.
  • Perform significantly better than current state-of-the-art quality estimation methods like DeepQuest and OpenKiwi in all the languages experimented.
  • Pre-trained quality estimation models for fifteen language pairs are available in HuggingFace.

Table of Contents

  1. Installation - Install TransQuest locally using pip.
  2. Architectures - Checkout the architectures implemented in TransQuest
    1. Sentence-level Architectures - We have released two architectures; MonoTransQuest and SiameseTransQuest to perform sentence level quality estimation.
    2. Word-level Architecture - We have released MicroTransQuest to perform word level quality estimation.
  3. Examples - We have provided several examples on how to use TransQuest in recent WMT quality estimation shared tasks.
    1. Sentence-level Examples
    2. Word-level Examples
  4. Pre-trained Models - We have provided pretrained quality estimation models for fifteen language pairs covering both sentence-level and word-level
    1. Sentence-level Models
    2. Word-level Models
  5. Contact - Contact us for any issues with TransQuest

Resources

Citations

If you are using the word-level architecture, please consider citing this paper which is accepted to ACL 2021.

@InProceedings{ranasinghe2021,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {An Exploratory Analysis of Multilingual Word Level Quality Estimation with Cross-Lingual Transformers},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
year = {2021}
}

If you are using the sentence-level architectures, please consider citing these papers which were presented in COLING 2020 and in WMT 2020 at EMNLP 2020.

@InProceedings{transquest:2020a,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest: Translation Quality Estimation with Cross-lingual Transformers},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
year = {2020}
}
@InProceedings{transquest:2020b,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest at WMT2020: Sentence-Level Direct Assessment},
booktitle = {Proceedings of the Fifth Conference on Machine Translation},
year = {2020}
}

More Repositories

1

Simple-Sentence-Similarity

Exploring the simple sentence similarity measurements using word embeddings
Python
101
star
2

Siamese-Recurrent-Architectures

Usage of Siamese Recurrent Neural network architectures for semantic textual similarity
Jupyter Notebook
22
star
3

MUDES

Multilingual Detection of Offensive Spans
Python
7
star
4

DeepOffense

Python
5
star
5

HASOC-2019

Hate Speech and Offensive Content Identification in Indo-European Languages
Jupyter Notebook
5
star
6

MOLD

Marathi Offensive Language Dataset
Python
2
star
7

Thesis

Deep Learning based Semantic Textual Similarity Metric for Applications in Translation Technology
TeX
2
star
8

Germeval-Task-2

This repo is the work done for Germeval Task 2, 2019 — Shared Task on the Identification of Offensive Language by RGCL
Jupyter Notebook
2
star
9

HateSpans

Python
2
star
10

MUDES-UI

System Demonstration for MUDES
Python
2
star
11

Irony-Detection

This repo is the work done for IDAT 2019 Shared Task — Shared Task on detecting irony in Arabic tweets by RGCL
Python
2
star
12

Offenseval_2020

SemEval-2020 Task 12: OffensEval 2020: Identifying and Categorizing Offensive Language in Social Media
Jupyter Notebook
2
star
13

STS-Transformers

Transformer based Semantic Textual Similarity
Python
2
star
14

SemEval-2019-Task-12-Toponym-Resolution-in-Scientific-Papers

Jupyter Notebook
2
star
15

Authorship-Detection

Python
1
star
16

Biomedical-Semantic-Similarity-Estimation

Jupyter Notebook
1
star
17

DistilOffense

Small, fast and cheap offensive language identification models
Python
1
star
18

FT5

Python
1
star
19

Toponym-Resolution

Jupyter Notebook
1
star
20

MultiTransQuest

Python
1
star
21

Intelligent-Translation-Memories

Jupyter Notebook
1
star
22

Aggression-Identification

Code for RANLP 2019 paper: "Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media"
Jupyter Notebook
1
star
23

NeTTT-2024

Jupyter Notebook
1
star