MAVEN-dataset
Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".
Data
The dataset (ver. 1.0) can be obtained from Tsinghua Cloud or Google Drive. The data format is introduced in this document.
We also release the document topics for data analysis and model development. The docid2topic.json
is to map the document ids to their EventWiki topic labels.
CodaLab
To get the test results, you can submit your predictions to our permanent CodaLab competition (the older version will be phased out soon). For the evaluation method, please refer to the evaluation script.
Codes
We release the source codes for the baselines, including DMCNN, BiLSTM, BiLSTM+CRF, MOGANED and DMBERT.
Citation
If these data and codes help you, please cite this paper.
@inproceedings{wang2020MAVEN,
title={{MAVEN}: A Massive General Domain Event Detection Dataset},
author={Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie},
booktitle={Proceedings of EMNLP 2020},
year={2020}
}