• Stars
    star
    407
  • Rank 106,183 (Top 3 %)
  • Language
    Python
  • License
    Creative Commons ...
  • Created almost 7 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

中文说明 | English



GitHub

This repository contains the data for The Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We will present our paper on EMNLP 2019.

Title: A Span-Extraction Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/D19-1600/
Venue: EMNLP-IJCNLP 2019

Open Challenge Leaderboard (New!)

Keep track of the latest state-of-the-art systems on CMRC 2018 dataset.
https://ymcui.github.io/cmrc2018/

CMRC 2018 Public Datasets

Please download CMRC 2018 public datasets via the following CodaLab Worksheet.
https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce

Submission Guidelines

If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet.
https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/

**Note that the test set on CLUE is NOT the complete test set. If you wish to evaluate your model OFFICIALLY on CMRC 2018, you should follow the guidelines here. **

Quick Load Through 🤗datasets

You can also access this dataset as part of the HuggingFace datasets library library as follow:

!pip install datasets
from datasets import load_dataset
dataset = load_dataset('cmrc2018')

More details on the options and usage for this library can be found on the nlp repository at https://github.com/huggingface/nlp

Reference

If you wish to use our data in your research, please cite:

@inproceedings{cui-emnlp2019-cmrc2018,
    title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension",
    author = "Cui, Yiming  and
      Liu, Ting  and
      Che, Wanxiang  and
      Xiao, Li  and
      Chen, Zhipeng  and
      Ma, Wentao  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1600",
    doi = "10.18653/v1/D19-1600",
    pages = "5886--5891",
}

International Standard Language Resource Number (ISLRN)

ISLRN: 013-662-947-043-2

http://www.islrn.org/resources/resources_info/7952/

Official HFL WeChat Account

Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.

qrcode.png

Contact us

Please submit an issue.

More Repositories

1

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Python
17,608
star
2

Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Python
9,356
star
3

Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Python
7,066
star
4

Chinese-XLNet

Pre-Trained Chinese XLNet(中文XLNet预训练模型)
Python
1,640
star
5

Chinese-ELECTRA

Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
Python
1,381
star
6

Chinese-LLaMA-Alpaca-3

中文羊驼大模型三期项目 (Chinese Llama-3 LLMs) developed from Meta Llama 3
Python
901
star
7

MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)
611
star
8

Chinese-Mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
Python
526
star
9

PERT

PERT: Pre-training BERT with Permuted Language Model
345
star
10

Chinese-RC-Datasets

Collections of Chinese reading comprehension datasets
211
star
11

LERT

LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)
Python
186
star
12

Chinese-Cloze-RC

A Chinese Cloze-style RC Dataset: People's Daily & Children's Fairy Tale (CFT)
165
star
13

cmrc2019

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
Python
125
star
14

LAMB_Optimizer_TF

LAMB Optimizer for Large Batch Training (TensorFlow version)
Python
120
star
15

cmrc2017

The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
Python
92
star
16

Eval-on-NN-of-RC

Empirical Evaluation on Current Neural Networks on Cloze-style Reading Comprehension
86
star
17

Chinese-MobileBERT

Chinese MobileBERT(中文MobileBERT模型)
Python
76
star
18

ChatGPT-in-Academia

Policies of scientific publisher and conferences towards large language model (LLM), such as ChatGPT
72
star
19

Cross-Lingual-MRC

Cross-Lingual Machine Reading Comprehension (EMNLP 2019)
Python
67
star
20

expmrc

ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Python
59
star
21

NLP-Review-Scorer

Score your NLP paper review
Jupyter Notebook
24
star
22

ACL2020-PC-Blogs-Chinese

Chinese Version of ACL 2020 PC Blogs (ACL 2020程序委员会博文中文版)
14
star
23

mrc-model-analysis

Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models (iScience)
Python
7
star