KorNLU Datasets
This is the dataset repository for our paper KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding.
We introduce KorNLI and KorSTS, which are NLI and STS datasets in Korean.
KorNLI
Dataset Overview
KorNLI | Total | Train | Dev. | Test |
---|---|---|---|---|
Source | - | SNLI, MNLI | XNLI | XNLI |
Translated by | - | Machine | Human | Human |
# Examples | 950,354 | 942,854 | 2,490 | 5,010 |
Avg. # words (premise) | 13.6 | 13.6 | 13.0 | 13.1 |
Avg. # words (hypothesis) | 7.1 | 7.2 | 6.8 | 6.8 |
Examples
Example | English Translation | Label |
---|---|---|
P: ์ ๋, ๊ทธ๋ฅ ์์๋ด๋ ค๊ณ ๊ฑฐ๊ธฐ ์์์ด์. H: ์ดํดํ๋ ค๊ณ ๋ ธ๋ ฅํ๊ณ ์์์ด์. |
I was just there just trying to figure it out. I was trying to understand. |
Entailment |
P: ์ ๋, ๊ทธ๋ฅ ์์๋ด๋ ค๊ณ ๊ฑฐ๊ธฐ ์์์ด์. H: ๋๋ ์ฒ์๋ถํฐ ๊ทธ๊ฒ์ ์ ์ดํดํ๋ค. |
I was just there just trying to figure it out. I understood it well from the beginning. |
Contradiction |
P: ์ ๋, ๊ทธ๋ฅ ์์๋ด๋ ค๊ณ ๊ฑฐ๊ธฐ ์์์ด์. H: ๋๋ ๋์ด ์ด๋๋ก ๊ฐ๋์ง ์ดํดํ๋ ค๊ณ ํ์ด์. |
I was just there just trying to figure it out. I was trying to understand where the money went. |
Neutral |
KorSTS
Dataset Overview
KorSTS | Total | Train | Dev. | Test |
---|---|---|---|---|
Source | - | STS-B | STS-B | STS-B |
Translated by | - | Machine | Human | Human |
# Examples | 8,628 | 5,749 | 1,500 | 1,379 |
Avg. # words | 7.7 | 7.5 | 8.7 | 7.6 |
Examples
Example | English Translation | Label |
---|---|---|
ํ ๋จ์๊ฐ ์์์ ๋จน๊ณ ์๋ค. ํ ๋จ์๊ฐ ๋ญ๊ฐ๋ฅผ ๋จน๊ณ ์๋ค. |
A man is eating food. A man is eating something. |
4.2 |
ํ ๋นํ๊ธฐ๊ฐ ์ฐฉ๋ฅํ๊ณ ์๋ค. ์ ๋๋ฉ์ด์ ํ๋ ๋นํ๊ธฐ ํ๋๊ฐ ์ฐฉ๋ฅํ๊ณ ์๋ค. |
A plane is landing. A animated airplane is landing. |
2.8 |
ํ ์ฌ์ฑ์ด ๊ณ ๊ธฐ๋ฅผ ์๋ฆฌํ๊ณ ์๋ค. ํ ๋จ์๊ฐ ๋งํ๊ณ ์๋ค. |
A woman is cooking meat. A man is speaking. |
0.0 |
License
Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0)
References
If you use KorNLI or KorSTS for research, please cite our paper:
@article{ham2020kornli,
title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding},
author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon},
journal={arXiv preprint arXiv:2004.03289},
year={2020}
}