Pretrained BigBird Model for Korean
What is BigBird β’ How to Use β’ Pretraining β’ Evaluation Result β’ Docs β’ Citation
νκ΅μ΄ | English
What is BigBird?
BigBird: Transformers for Longer Sequencesμμ μκ°λ sparse-attention κΈ°λ°μ λͺ¨λΈλ‘, μΌλ°μ μΈ BERTλ³΄λ€ λ κΈ΄ sequenceλ₯Ό λ€λ£° μ μμ΅λλ€.
How to Use
π€ Huggingface Hubμ μ λ‘λλ λͺ¨λΈμ 곧λ°λ‘ μ¬μ©ν μ μμ΅λλ€:)- μΌλΆ μ΄μκ° ν΄κ²°λ
transformers>=4.11.0
μ¬μ©μ κΆμ₯ν©λλ€. (MRC μ΄μ κ΄λ ¨ PR) - BigBirdTokenizer λμ μ
BertTokenizer
λ₯Ό μ¬μ©ν΄μΌ ν©λλ€. (AutoTokenizer
μ¬μ©μBertTokenizer
κ° λ‘λλ©λλ€.) - μμΈν μ¬μ©λ²μ BigBird Tranformers documentationμ μ°Έκ³ ν΄μ£ΌμΈμ.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base") # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base") # BertTokenizer
Pretraining
μμΈν λ΄μ©μ [Pretraining BigBird] μ°Έκ³
Hardware | Max len | LR | Batch | Train Step | Warmup Step | |
---|---|---|---|---|---|---|
KoBigBird-BERT-Base | TPU v3-8 | 4096 | 1e-4 | 32 | 2M | 20k |
- λͺ¨λμ λ§λμΉ, νκ΅μ΄ μν€, Common Crawl, λ΄μ€ λ°μ΄ν° λ± λ€μν λ°μ΄ν°λ‘ νμ΅
ITC (Internal Transformer Construction)
λͺ¨λΈλ‘ νμ΅ (ITC vs ETC)
Evaluation Result
1. Short Sequence (<=512)
μμΈν λ΄μ©μ [Finetune on Short Sequence Dataset] μ°Έκ³
NSMC (acc) |
KLUE-NLI (acc) |
KLUE-STS (pearsonr) |
Korquad 1.0 (em/f1) |
KLUE MRC (em/rouge-w) |
|
---|---|---|---|---|---|
KoELECTRA-Base-v3 | 91.13 | 86.87 | 93.14 | 85.66 / 93.94 | 59.54 / 65.64 |
KLUE-RoBERTa-Base | 91.16 | 86.30 | 92.91 | 85.35 / 94.53 | 69.56 / 74.64 |
KoBigBird-BERT-Base | 91.18 | 87.17 | 92.61 | 87.08 / 94.71 | 70.33 / 75.34 |
2. Long Sequence (>=1024)
μμΈν λ΄μ©μ [Finetune on Long Sequence Dataset] μ°Έκ³
TyDi QA (em/f1) |
Korquad 2.1 (em/f1) |
Fake News (f1) |
Modu Sentiment (f1-macro) |
|
---|---|---|---|---|
KLUE-RoBERTa-Base | 76.80 / 78.58 | 55.44 / 73.02 | 95.20 | 42.61 |
KoBigBird-BERT-Base | 79.13 / 81.30 | 67.77 / 82.03 | 98.85 | 45.42 |
Docs
- Pretraing BigBird
- Finetune on Short Sequence Dataset
- Finetune on Long Sequence Dataset
- Download Tensorflow v1 checkpoint
- GPU Benchmark result
Citation
KoBigBirdλ₯Ό μ¬μ©νμ λ€λ©΄ μλμ κ°μ΄ μΈμ©ν΄μ£ΌμΈμ.
@software{jangwon_park_2021_5654154,
author = {Jangwon Park and Donggyu Kim},
title = {KoBigBird: Pretrained BigBird Model for Korean},
month = nov,
year = 2021,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.5654154},
url = {https://doi.org/10.5281/zenodo.5654154}
}
Contributors
Acknowledgements
KoBigBirdλ Tensorflow Research Cloud (TFRC) νλ‘κ·Έλ¨μ Cloud TPU μ§μμΌλ‘ μ μλμμ΅λλ€.
λν λ©μ§ λ‘κ³ λ₯Ό μ 곡ν΄μ£Όμ Seyun Ahnλκ» κ°μ¬λ₯Ό μ ν©λλ€.