KoGPT
- KoGPT (Korean Generative Pre-trained Transformer)
Model Descriptions
KoGPT6B-ryan1.5b
- [huggingface][kakaobrain/kogpt][KoGPT6B-ryan1.5b]
- [huggingface][kakaobrain/kogpt][KoGPT6B-ryan1.5b-float16]
Hyperparameter | Value |
---|---|
6,166,502,400 | |
28 | |
4,096 | |
16,384 | |
16 | |
256 | |
2,048 | |
64,512 | |
Positional Encoding | Rotary Position Embedding (RoPE) |
RoPE Dimensions | 64 |
Hardware requirements
KoGPT6B-ryan1.5b
GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
32GB GPU RAM
in the required minimum memory size
KoGPT6B-ryan1.5b-float16
GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
- half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
16GB GPU RAM
in the required minimum memory size
Usage
prompt
python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
[--device {cpu,cuda}] [-d]
KakaoBrain Korean(hangul) Generative Pre-Training Model
optional arguments:
-h, --help show this help message and exit
--model MODEL huggingface repo (default:kakaobrain/kogpt)
--revision {KoGPT6B-ryan1.5b}
--device {cpu,cuda} (default:cuda)
-d, --debug
python -m kogpt
prompt> ์ธ๊ฐ์ฒ๋ผ ์๊ฐํ๊ณ , ํ๋ํ๋ '์ง๋ฅ'์ ํตํด ์ธ๋ฅ๊ฐ ์ด์ ๊น์ง ํ์ง ๋ชปํ๋
temperature(0.8)>
max_length(128)> 64
์ธ๊ฐ์ฒ๋ผ ์๊ฐํ๊ณ , ํ๋ํ๋ '์ง๋ฅ'์ ํตํด ์ธ๋ฅ๊ฐ ์ด์ ๊น์ง ํ์ง ๋ชปํ๋ ๋ฌธ์ ์ ํด๋ต์ ์ฐพ์ ์ ์์ ๊ฒ์ด๋ค. ๊ณผํ๊ธฐ์ ์ด ๊ณ ๋๋ก ๋ฐ๋ฌํ 21์ธ๊ธฐ๋ฅผ ์ด์๊ฐ ์ฐ๋ฆฌ ์์ด๋ค์๊ฒ ๊ฐ์ฅ ํ์ํ ๊ฒ์ ์ฌ๊ณ ๋ ฅ ํ๋ จ์ด๋ค. ์ฌ๊ณ ๋ ฅ ํ๋ จ์ ํตํด, ์ธ์
prompt>
...
python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
pad_token_id=tokenizer.eos_token_id,
torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()
prompt = '์ธ๊ฐ์ฒ๋ผ ์๊ฐํ๊ณ , ํ๋ํ๋ \'์ง๋ฅ\'์ ํตํด ์ธ๋ฅ๊ฐ ์ด์ ๊น์ง ํ์ง ๋ชปํ๋'
with torch.no_grad():
tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
generated = tokenizer.batch_decode(gen_tokens)[0]
print(generated) # print: ์ธ๊ฐ์ฒ๋ผ ์๊ฐํ๊ณ , ํ๋ํ๋ '์ง๋ฅ'์ ํตํด ์ธ๋ฅ๊ฐ ์ด์ ๊น์ง ํ์ง ๋ชปํ๋ ๋ฌธ์ ์ ํด๋ต์ ์ฐพ์ ์ ์์ ๊ฒ์ด๋ค. ๊ณผํ๊ธฐ์ ์ด ๊ณ ๋๋ก ๋ฐ๋ฌํ 21์ธ๊ธฐ๋ฅผ ์ด์๊ฐ ์ฐ๋ฆฌ ์์ด๋ค์๊ฒ ๊ฐ์ฅ ํ์ํ ๊ฒ์ ์ฌ๊ณ ๋ ฅ ํ๋ จ์ด๋ค. ์ฌ๊ณ ๋ ฅ ํ๋ จ์ ํตํด, ์ธ์
Experiments
In-context Few-Shots
Models | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) |
---|---|---|---|---|
HyperCLOVA[1] | 1.3B | 83.9 | 58.7 | 60.9 |
HyperCLOVA[1] | 6.9B | 83.8 | 67.5 | 59.3 |
HyperCLOVA[1] | 13.0B | 87.9 | 67.9 | 60.0 |
HyperCLOVA[1] | 39.0B | 88.0 | 71.4 | 61.6 |
HyperCLOVA[1] | 82.0B | 88.2 | 72.7 | 65.1 |
Ours | 6.0B | 87.8 | 78.0 | 64.3 |
Finetuning / P-Tuning
We have been reported to have issues(#17) with our downstream evaluation.
The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.
You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.
Limitations
KakaoBrain KoGPT
was trained on raw data, a dataset known to contain profanity, lewd, political changed, and other harsh language.
Therefore, KoGPT
can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how KoGPT
will response to particular prompts and offensive content without warning.
Primarily Korean: KoGPT
is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts.
KoGPT
by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.
If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [email protected].
์นด์นด์ค๋ธ๋ ์ธ KoGPT
๋ AI์ปค๋ฎค๋ํฐ๋ฅผ ์ํ ์ฐ๊ตฌ์ฉ์ผ๋ก ์์ค, ์๋, ์ ์น์ ๋ด์ฉ ๋ฐ ๊ธฐํ ๊ฑฐ์น ์ธ์ด์ ๋ํ ์ฒ๋ฆฌ๋ฅผ ํ์ง ์์ ์์ ๋ฐ์ดํฐ๋ก ํ์ตํ์์ต๋๋ค.
๋ฐ๋ผ์ KoGPT
๋ ์ฌํ์ ์ผ๋ก ์ฉ์ธ๋์ง ์์ ํ
์คํธ๋ฅผ ์์ฑํ ์ ์์ต๋๋ค. ๋ค๋ฅธ ์ธ์ด ๋ชจ๋ธ๊ณผ ๋ง์ฐฌ๊ฐ์ง๋ก ํน์ ํ๋กฌํํธ์ ๊ณต๊ฒฉ์ ์ธ ์ฝํ
์ธ ์ ์ด๋ ํ ๊ฒฐ๊ณผ๋ฅผ ์์ฑํ ์ง ์ฌ์ ์ ํ์
ํ๊ธฐ ์ด๋ ต์ต๋๋ค.
KoGPT
๋ ์ฃผ๋ก ํ๊ตญ์ด ํ
์คํธ๋ก ํ์ต์ ํ์์ผ๋ฉฐ ์ด๋ฌํ ํ
์คํธ๋ฅผ ๋ถ๋ฅ, ๊ฒ์, ์์ฝ ๋๋ ์์ฑํ๋๋ฐ ๊ฐ์ฅ ์ ํฉํฉ๋๋ค.
๊ธฐ๋ณธ์ ์ผ๋ก KoGPT
๋ ํ์ต ๋ฐ์ดํฐ์ ์ ๋ํ๋์ง ์๋ ๋ฐฉ์ธ๋ฟ๋ง์๋๋ผ ํ๊ตญ์ด๊ฐ ์๋ ๊ฒฝ์ฐ์ ๊ฐ์ด ํ์ต ๋ฐ์ดํฐ์์ ๋ฐ๊ฒฌํ๊ธฐ ์ด๋ ค์ด ์
๋ ฅ์์ ์ข์ง ์์ ์ฑ๋ฅ์ ๋ณด์
๋๋ค.
๋ณธ KoGPT๋ฅผ ํ์ฉํ ์ฐ๊ตฌ, ๊ฐ๋ฐ, ํ
์คํธ ๋ฑ์ ์์ด ์์ ๋ถ๋ถ์ ๊ผญ ์ ์ํ์๊ธฐ ๋ฐ๋๋๋ค.
ํ
์คํธ์ค์ ๋ฐ์ํ ๋น์ ์์ ์ธ ํน์ ์ฌํ์ ์ผ๋ก ์ฉ์ธ๋์ง ์๋ ํ
์คํธ๊ฐ ์์ฑ๋ ๊ฒฝ์ฐ [email protected]๋ก "prompt"์ "์์ฑ๋ ๋ฌธ์ฅ"์ ํจ๊ป ๋ณด๋ด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค.
Citation
If you apply this library or model to any project and research, please cite our code:
@misc{kakaobrain2021kogpt,
title = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
author = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
year = {2021},
howpublished = {\url{https://github.com/kakaobrain/kogpt}},
}
Contact
This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.
License
The source code
of KakaoBrain KoGPT
are licensed under Apache 2.0 License.
The pretrained weights
of KakaoBrain KoGPT
are licensed under CC-BY-NC-ND 4.0 License License.
์นด์นด์ค๋ธ๋ ์ธ KoGPT
์ ์์ค์ฝ๋(source code)
๋ Apache 2.0 ๋ผ์ด์ ์ค ํ์ ๊ณต๊ฐ๋์ด ์์ต๋๋ค.
์นด์นด์ค๋ธ๋ ์ธ KoGPT
์ ์ฌ์ ํ์ต๋ ๊ฐ์ค์น(pretrained weights)
๋ CC-BY-NC-ND 4.0 ๋ผ์ด์ ์ค ๋ผ์ด์ ์ค ํ์ ๊ณต๊ฐ๋์ด ์์ต๋๋ค.
๋ชจ๋ธ ๋ฐ ์ฝ๋, ์ฌ์ ํ์ต๋ ๊ฐ์ค์น๋ฅผ ์ฌ์ฉํ ๊ฒฝ์ฐ ๋ผ์ด์ ์ค ๋ด์ฉ์ ์ค์ํด ์ฃผ์ญ์์ค. ๋ผ์ด์ ์ค ์ ๋ฌธ์ Apache 2.0, LICENSE.cc-by-nc-nd-4.0 ํ์ผ์์ ํ์ธํ์ค ์ ์์ต๋๋ค.
Obligation to use
While Open Source software may be free to use, that does not mean it is free of obligation. To determine whether your intended use of KoGPT is suitable for the Apache 2.0 (or CC-BY-NC-ND 4.0), please consider the license guide. If you violate the license, you may be subject to legal action such as prohibition of use or claim for damages depending on the use.
์คํ์์ค ์ํํธ์จ์ด๋ ๋ฌด๋ฃ๋ก ์ฌ์ฉํ ์ ์์ง๋ง ์ด๊ฒ์ด ์๋ฌด๊ฐ ์๋ค๋ ์๋ฏธ๋ ์๋๋๋ค. KoGPT์ ์ฌ์ฉ์ ์์ ๋ผ์ด์ ์ค ๊ฐ์ด๋๋ฅผ ์ดํด๋ณด๊ณ ์์ ํ ์ฌ์ฉ์ด Apache 2.0 (๋๋ CC-BY-NC-ND 4.0)๋ฅผ ์ค์ํ๋์ง ์ฌ๋ถ๋ฅผ ๋จผ์ ํ์ธํ์๊ธฐ ๋ฐ๋๋๋ค. ๋ผ์ด์ ์ค๋ฅผ ์๋ฐํ๋ ๊ฒฝ์ฐ, ๋ด์ฉ์ ๋ฐ๋ผ ์ฌ์ฉ๊ธ์ง, ์ํด๋ฐฐ์ ์ฒญ๊ตฌ ๋ฑ์ ๋ฒ์ ์กฐ์น๋ฅผ ์ทจํ ์ ์์ต๋๋ค.
References
[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
Contribution
Disclaimer
The contribution section is not an official KakaoBrain product.
AK391's Web Demo on Huggingface Spaces
- see demo: https://huggingface.co/spaces/akhaliq/kogpt
- Web Demo is integrated to Huggingface Spaces with Gradio.
- Contributors: AK391