• Stars
    star
    125
  • Rank 277,297 (Top 6 %)
  • Language
    Python
  • License
    Creative Commons ...
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)



GitHub

This repository contains the data for The Third Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2019). We will present our paper at COLING 2020,

Title: A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu
Link: https://arxiv.org/abs/2004.03116
Venue: COLING 2020

Open Challenge Leaderboard (New!)

Keep track of the latest state-of-the-art systems on CMRC 2019 dataset. https://ymcui.github.io/cmrc2019/

Submission Guidelines

If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet. https://worksheets.codalab.org/worksheets/0xe856b40d21de45bf898cd1d3c5135afe

Directory Guide

  • baseline: a Chinese BERT-based simple baseline system

  • eval: contains official evaluation script

  • data: contains offical evaluation data

  • sample_submission: sample submission for codalab competition platform (trial_rand_submission.zip is a randomly generated prediction file, trial_submission.zip is the BERT baseline prediction file)

Baseline System

We provide a BERT-based baseline system for participants (check baseline directory for more info).

Results on other sets will be annouced later.

QAC: Question-Level Accuracy

PAC: Passage-Level Accuracy

Data Passage # Query # QAC PAC Fake Candidates Availability
Trial Data 139 1,504 71.941% 28.776% No Public
Train Data 9,638 100,009 N/A N/A No Public
Development Data 300 3,053 70.586% 13.333% Yes Public
Qualifying Data 500 5,081 70.01% 8.20% Yes Semi-Hidden
Test Data - - - - Yes Hidden

International Standard Language Resource Number (ISLRN)

ISLRN: 813-010-842-493-2

http://www.islrn.org/resources/resources_info/8624/

Reference

If you wish to use our data in your research, please cite our paper:

@inproceeding={cui-etal-2020-cmrc2019,
  title={A Sentence Cloze Dataset for Chinese Machine Reading Comprehension},
  author={Cui, Yiming and Liu, Ting and Yang, Ziqing and Chen, Zhipeng and Ma, Wentao and Che, Wanxiang and Wang, Shijin and Hu, Guoping},
  booktitle = 	"Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)",
  year={2020}
}

Organization Committee

Host: Chinese Information Processing Society of China (CIPS)
Organizer: Joint Laboratory of HIT and iFLYTEK Research (HFL)
Sponsor: iFLYTEK Co., Ltd. and iFLYTEK Research (Hebei)

Evaluation Co-Chairs

Ting Liu, Harbin Institute of Technology
Yiming Cui, Joint Laboratory of HIT and iFLYTEK Research

Official HFL WeChat Account

Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.

qrcode.png

Contact us

Any problems? Feel free to concat us.
Email: cmrc2019 [aT] 126 [DoT] com
Forum: CodaLab Competition Forum
CMRC 2019 Official Website (中文):https://cmrc2019.hfl-rc.com/
CMRC 2019 Official Website (English):https://cmrc2019.hfl-rc.com/english/

More Repositories

1

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Python
16,894
star
2

Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Python
9,105
star
3

Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Python
6,760
star
4

Chinese-XLNet

Pre-Trained Chinese XLNet(中文XLNet预训练模型)
Python
1,627
star
5

Chinese-ELECTRA

Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)
Python
1,357
star
6

MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)
593
star
7

cmrc2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
Python
398
star
8

Chinese-Mixtral

中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
Python
394
star
9

PERT

PERT: Pre-training BERT with Permuted Language Model
338
star
10

Chinese-RC-Datasets

Collections of Chinese reading comprehension datasets
211
star
11

LERT

LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)
Python
179
star
12

Chinese-Cloze-RC

A Chinese Cloze-style RC Dataset: People's Daily & Children's Fairy Tale (CFT)
165
star
13

LAMB_Optimizer_TF

LAMB Optimizer for Large Batch Training (TensorFlow version)
Python
120
star
14

cmrc2017

The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
Python
92
star
15

Eval-on-NN-of-RC

Empirical Evaluation on Current Neural Networks on Cloze-style Reading Comprehension
86
star
16

Chinese-MobileBERT

Chinese MobileBERT(中文MobileBERT模型)
Python
72
star
17

ChatGPT-in-Academia

Policies of scientific publisher and conferences towards large language model (LLM), such as ChatGPT
72
star
18

Cross-Lingual-MRC

Cross-Lingual Machine Reading Comprehension (EMNLP 2019)
Python
66
star
19

expmrc

ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Python
59
star
20

NLP-Review-Scorer

Score your NLP paper review
Jupyter Notebook
24
star
21

ACL2020-PC-Blogs-Chinese

Chinese Version of ACL 2020 PC Blogs (ACL 2020程序委员会博文中文版)
15
star
22

mrc-model-analysis

Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models (iScience)
Python
7
star