• Stars
    star
    336
  • Rank 125,564 (Top 3 %)
  • Language
  • Created about 6 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Repository to track the progress in Vietnamese Natural Language Processing, including the datasets and the current state-of-the-art for the most common Vietnamese NLP tasks.

Tracking Progress in Vietnamese NLP

This document aims to track the progress in Vietnamese Natural Language Processing and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

Table of contents

Contributing

If you would like to add a new result, you can do so with a pull request (PR). In order to minimize noise and to make maintenance somewhat manageable, results reported in published papers will be preferred (indicate the venue of publication in your PR); an exception may be made for influential preprints. The result should include the name of the method, the citation, the score, and a link to the paper and should be added so that the table is sorted (with the best result on top).

If your pull request contains a new result, please make sure that "new result" appears somewhere in the title of the PR. This way, we can track which tasks are the most active and receive the most attention.

In order to make reproduction easier, we recommend to add a link to an implementation to each method if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Model Score Paper/Source Code
Official
Link

To add a new dataset or task, follow the below steps. Any new datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

  1. Fork the repository.
  2. If your task is completely new, create a new file and link to it in the table of contents above. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
  3. Briefly describe the dataset/task and include relevant references.
  4. Describe the evaluation setting and evaluation metric.
  5. Show how an annotated example of the dataset/task looks like.
  6. Add a download link if available.
  7. Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset).
  8. Submit your change as a pull request.
Model Score Paper/Source Code

More Repositories

1

underthesea

Underthesea - Vietnamese NLP Toolkit
Python
1,371
star
2

chatbot

Vietnamese Chatbot
C
89
star
3

dictionary

Vietnamese Dictionary
Python
73
star
4

automatic_speech_recognition

Vietnamese Automatic Speech Recognition
Python
65
star
5

ner

Vietnamese Named Entity Recognition
Python
50
star
6

word_tokenize

Vietnamese Word Tokenize
Python
48
star
7

slp3-vietnamese

Speech and Language Processing 3rd edition Vietnamese Translation
TeX
23
star
8

resources

Open Vietnamese NLP Resources
Python
18
star
9

sentiment

Vietnamese Sentiment Analysis
Python
17
star
10

corpus.viwiki

Vietnamese Wikipedia Corpus
Python
16
star
11

pos_tag

Vietnamese POS Tagging
Python
12
star
12

speech_classification

Vietnamese Speech Classification experiments
Python
7
star
13

treebank

An attempt to build Vietnamese Open Treebank
Python
7
star
14

terminology

Thuật ngữ lĩnh vực ngôn ngữ học và xử lý ngôn ngữ tự nhiên
Python
6
star
15

sent_tokenize

Vietnamese Sentence Boundary Detection
Python
5
star
16

text_normalization

Vietnamese Text Normalization
Python
5
star
17

chunking

Vietnamese Chunking experiments
Python
5
star
18

text_to_speech

Vietnamese Text to Speech
Perl
4
star
19

lang_detect

Vietnamese Language Detection
Python
3
star
20

word_embeddings

Vietnamese Word Embeddings
Python
2
star
21

playground

Open Vietnamese NLP Experiments
Python
2
star
22

knowledge

Knowledge Base
Julia
2
star
23

publications

Tổng hợp các báo cáo trong lĩnh vực xử lý ngôn ngữ tự nhiên tiếng Việt
2
star
24

amrbank

An attempt to build Vietnamese AMR Bank
2
star
25

machine_translation

Vietnamese Machine Translation Experiments
2
star
26

normalization

Vietnamese Text normalization
Jupyter Notebook
1
star