• Stars
    star
    994
  • Rank 44,471 (Top 0.9 %)
  • Language
    Python
  • License
    Other
  • Created over 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT

KakaoBrain Github: kogpt License: Apache 2.0
huggingface: KoGPT-6B huggingface: KoGPT-6B License: CC BY-NC-ND 4.0

Model Descriptions

KoGPT6B-ryan1.5b

Hyperparameter Value
6,166,502,400
28
4,096
16,384
16
256
2,048
64,512
Positional Encoding Rotary Position Embedding (RoPE)
RoPE Dimensions 64

Hardware requirements

KoGPT6B-ryan1.5b

GPU

The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.

  • 32GB GPU RAM in the required minimum memory size

KoGPT6B-ryan1.5b-float16

GPU

The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.

  • half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
  • 16GB GPU RAM in the required minimum memory size

Usage

prompt

python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
                       [--device {cpu,cuda}] [-d]

KakaoBrain Korean(hangul) Generative Pre-Training Model

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         huggingface repo (default:kakaobrain/kogpt)
  --revision {KoGPT6B-ryan1.5b}
  --device {cpu,cuda}   (default:cuda)
  -d, --debug
python -m kogpt
prompt> ์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•˜๊ณ , ํ–‰๋™ํ•˜๋Š” '์ง€๋Šฅ'์„ ํ†ตํ•ด ์ธ๋ฅ˜๊ฐ€ ์ด์ œ๊นŒ์ง€ ํ’€์ง€ ๋ชปํ–ˆ๋˜
temperature(0.8)> 
max_length(128)> 64
์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•˜๊ณ , ํ–‰๋™ํ•˜๋Š” '์ง€๋Šฅ'์„ ํ†ตํ•ด ์ธ๋ฅ˜๊ฐ€ ์ด์ œ๊นŒ์ง€ ํ’€์ง€ ๋ชปํ–ˆ๋˜ ๋ฌธ์ œ์˜ ํ•ด๋‹ต์„ ์ฐพ์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๊ณผํ•™๊ธฐ์ˆ ์ด ๊ณ ๋„๋กœ ๋ฐœ๋‹ฌํ•œ 21์„ธ๊ธฐ๋ฅผ ์‚ด์•„๊ฐˆ ์šฐ๋ฆฌ ์•„์ด๋“ค์—๊ฒŒ ๊ฐ€์žฅ ํ•„์š”ํ•œ ๊ฒƒ์€ ์‚ฌ๊ณ ๋ ฅ ํ›ˆ๋ จ์ด๋‹ค. ์‚ฌ๊ณ ๋ ฅ ํ›ˆ๋ จ์„ ํ†ตํ•ด, ์„ธ์ƒ

prompt>  
...

python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM 

tokenizer = AutoTokenizer.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
  'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16',  # or float32 version: revision=KoGPT6B-ryan1.5b
  pad_token_id=tokenizer.eos_token_id,
  torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()

prompt = '์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•˜๊ณ , ํ–‰๋™ํ•˜๋Š” \'์ง€๋Šฅ\'์„ ํ†ตํ•ด ์ธ๋ฅ˜๊ฐ€ ์ด์ œ๊นŒ์ง€ ํ’€์ง€ ๋ชปํ–ˆ๋˜'
with torch.no_grad():
  tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
  gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
  generated = tokenizer.batch_decode(gen_tokens)[0]
  
print(generated)  # print: ์ธ๊ฐ„์ฒ˜๋Ÿผ ์ƒ๊ฐํ•˜๊ณ , ํ–‰๋™ํ•˜๋Š” '์ง€๋Šฅ'์„ ํ†ตํ•ด ์ธ๋ฅ˜๊ฐ€ ์ด์ œ๊นŒ์ง€ ํ’€์ง€ ๋ชปํ–ˆ๋˜ ๋ฌธ์ œ์˜ ํ•ด๋‹ต์„ ์ฐพ์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๊ณผํ•™๊ธฐ์ˆ ์ด ๊ณ ๋„๋กœ ๋ฐœ๋‹ฌํ•œ 21์„ธ๊ธฐ๋ฅผ ์‚ด์•„๊ฐˆ ์šฐ๋ฆฌ ์•„์ด๋“ค์—๊ฒŒ ๊ฐ€์žฅ ํ•„์š”ํ•œ ๊ฒƒ์€ ์‚ฌ๊ณ ๋ ฅ ํ›ˆ๋ จ์ด๋‹ค. ์‚ฌ๊ณ ๋ ฅ ํ›ˆ๋ จ์„ ํ†ตํ•ด, ์„ธ์ƒ

Experiments

In-context Few-Shots

Models #params NSMC (Acc.) YNAT (F1) KLUE-STS (F1)
HyperCLOVA[1] 1.3B 83.9 58.7 60.9
HyperCLOVA[1] 6.9B 83.8 67.5 59.3
HyperCLOVA[1] 13.0B 87.9 67.9 60.0
HyperCLOVA[1] 39.0B 88.0 71.4 61.6
HyperCLOVA[1] 82.0B 88.2 72.7 65.1
Ours 6.0B 87.8 78.0 64.3

Finetuning / P-Tuning

We have been reported to have issues(#17) with our downstream evaluation.

The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.

You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.

Limitations

KakaoBrain KoGPT was trained on raw data, a dataset known to contain profanity, lewd, political changed, and other harsh language. Therefore, KoGPT can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how KoGPT will response to particular prompts and offensive content without warning.

Primarily Korean: KoGPT is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts. KoGPT by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.

If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [email protected].

์นด์นด์˜ค๋ธŒ๋ ˆ์ธ KoGPT๋Š” AI์ปค๋ฎค๋‹ˆํ‹ฐ๋ฅผ ์œ„ํ•œ ์—ฐ๊ตฌ์šฉ์œผ๋กœ ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๋ฅผ ํ•˜์ง€ ์•Š์€ ์›์‹œ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ KoGPT๋Š” ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š์€ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์ • ํ”„๋กฌํ”„ํŠธ์™€ ๊ณต๊ฒฉ์ ์ธ ์ฝ˜ํ…์ธ ์— ์–ด๋– ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ• ์ง€ ์‚ฌ์ „์— ํŒŒ์•…ํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

KoGPT๋Š” ์ฃผ๋กœ ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ๋กœ ํ•™์Šต์„ ํ•˜์˜€์œผ๋ฉฐ ์ด๋Ÿฌํ•œ ํ…์ŠคํŠธ๋ฅผ ๋ถ„๋ฅ˜, ๊ฒ€์ƒ‰, ์š”์•ฝ ๋˜๋Š” ์ƒ์„ฑํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ KoGPT๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์ž˜ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š๋Š” ๋ฐฉ์–ธ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ํ•œ๊ตญ์–ด๊ฐ€ ์•„๋‹Œ ๊ฒฝ์šฐ์™€ ๊ฐ™์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ๋ฐœ๊ฒฌํ•˜๊ธฐ ์–ด๋ ค์šด ์ž…๋ ฅ์—์„œ ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

๋ณธ KoGPT๋ฅผ ํ™œ์šฉํ•œ ์—ฐ๊ตฌ, ๊ฐœ๋ฐœ, ํ…Œ์ŠคํŠธ ๋“ฑ์— ์žˆ์–ด ์œ„์˜ ๋ถ€๋ถ„์„ ๊ผญ ์œ ์˜ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
ํ…Œ์ŠคํŠธ์ค‘์— ๋ฐœ์ƒํ•œ ๋น„์ •์ƒ์ ์ธ ํ˜น์€ ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ [email protected]๋กœ "prompt"์™€ "์ƒ์„ฑ๋œ ๋ฌธ์žฅ"์„ ํ•จ๊ป˜ ๋ณด๋‚ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Citation

If you apply this library or model to any project and research, please cite our code:

@misc{kakaobrain2021kogpt,
  title         = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
  author        = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
  year          = {2021},
  howpublished  = {\url{https://github.com/kakaobrain/kogpt}},
}

Contact

This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.

[email protected]

License

The source code of KakaoBrain KoGPT are licensed under Apache 2.0 License.
The pretrained weights of KakaoBrain KoGPT are licensed under CC-BY-NC-ND 4.0 License License.

์นด์นด์˜ค๋ธŒ๋ ˆ์ธ KoGPT์˜ ์†Œ์Šค์ฝ”๋“œ(source code)๋Š” Apache 2.0 ๋ผ์ด์„ ์Šค ํ•˜์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์นด์นด์˜ค๋ธŒ๋ ˆ์ธ KoGPT์˜ ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜(pretrained weights)๋Š” CC-BY-NC-ND 4.0 ๋ผ์ด์„ ์Šค ๋ผ์ด์„ ์Šค ํ•˜์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ ๋ฐ ์ฝ”๋“œ, ์‚ฌ์ „ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋ผ์ด์„ ์Šค ๋‚ด์šฉ์„ ์ค€์ˆ˜ํ•ด ์ฃผ์‹ญ์‹œ์˜ค. ๋ผ์ด์„ ์Šค ์ „๋ฌธ์€ Apache 2.0, LICENSE.cc-by-nc-nd-4.0 ํŒŒ์ผ์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Obligation to use

While Open Source software may be free to use, that does not mean it is free of obligation. To determine whether your intended use of KoGPT is suitable for the Apache 2.0 (or CC-BY-NC-ND 4.0), please consider the license guide. If you violate the license, you may be subject to legal action such as prohibition of use or claim for damages depending on the use.

์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด๋Š” ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ด๊ฒƒ์ด ์˜๋ฌด๊ฐ€ ์—†๋‹ค๋Š” ์˜๋ฏธ๋Š” ์•„๋‹™๋‹ˆ๋‹ค. KoGPT์˜ ์‚ฌ์šฉ์— ์•ž์„œ ๋ผ์ด์„ ์Šค ๊ฐ€์ด๋“œ๋ฅผ ์‚ดํŽด๋ณด๊ณ  ์˜ˆ์ •ํ•œ ์‚ฌ์šฉ์ด Apache 2.0 (๋˜๋Š” CC-BY-NC-ND 4.0)๋ฅผ ์ค€์ˆ˜ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋จผ์ € ํ™•์ธํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๋ผ์ด์„ ์Šค๋ฅผ ์œ„๋ฐ˜ํ•˜๋Š” ๊ฒฝ์šฐ, ๋‚ด์šฉ์— ๋”ฐ๋ผ ์‚ฌ์šฉ๊ธˆ์ง€, ์†ํ•ด๋ฐฐ์ƒ ์ฒญ๊ตฌ ๋“ฑ์˜ ๋ฒ•์  ์กฐ์น˜๋ฅผ ์ทจํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

References

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).


Contribution

Disclaimer

The contribution section is not an official KakaoBrain product.

AK391's Web Demo on Huggingface Spaces

More Repositories

1

fast-autoaugment

Official Implementation of 'Fast AutoAugment' in PyTorch.
Python
1,581
star
2

pororo

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Python
1,252
star
3

nerf-factory

An awesome PyTorch NeRF library
Python
1,239
star
4

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset
Python
1,062
star
5

torchgpipe

A GPipe implementation in PyTorch
Python
776
star
6

karlo

Python
679
star
7

rq-vae-transformer

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Jupyter Notebook
669
star
8

mindall-e

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs
Python
630
star
9

honeybee

Official implementation of project Honeybee (CVPR 2024)
Python
370
star
10

word2word

Easy-to-use word-to-word translations for 3,564 language pairs.
Python
350
star
11

torchlars

A LARS implementation in PyTorch
Python
326
star
12

g2pm

A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Python
326
star
13

kor-nlu-datasets

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding
283
star
14

trident

A performance library for machine learning applications.
Python
176
star
15

autoclint

A specially designed light version of Fast AutoAugment
Python
170
star
16

sparse-detr

PyTorch Implementation of Sparse DETR
Python
150
star
17

hotr

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)
Python
132
star
18

kortok

The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)
Python
114
star
19

scrl

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)
Python
108
star
20

bassl

Python
108
star
21

flame

Official implementation of the paper "FLAME: Free-form Language-based Motion Synthesis & Editing"
Python
103
star
22

tcl

Official implementation of TCL (CVPR 2023)
Python
98
star
23

brain-agent

Brain Agent for Large-Scale and Multi-Task Agent Learning
Python
92
star
24

helo-word

Team Kakao&Brain's Grammatical Error Correction System for the ACL 2019 BEA Shared Task
Python
88
star
25

miro

Official PyTorch implementation of MIRO (ECCV 2022)
Python
82
star
26

jejueo

Jejueo Datasets for Machine Translation and Speech Synthesis
Python
74
star
27

solvent

Python
66
star
28

noc

Jupyter Notebook
44
star
29

cxr-clip

Python
43
star
30

expgan

Python
41
star
31

autowu

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)
Python
39
star
32

nvs-adapter

Python
33
star
33

ginr-ipc

The official implementation of Generalizable Implicit Neural Representations with Instance Pattern Composers(CVPRโ€™23 highlight).
Python
30
star
34

coyo-vit

ViT trained on COYO-Labeled-300M dataset
Python
28
star
35

irm-empirical-study

An Empirical Study of Invariant Risk Minimization
Python
28
star
36

coyo-align

ALIGN trained on COYO-dataset
Python
25
star
37

magvlt

The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
Python
21
star
38

hqtransformer

Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)
Jupyter Notebook
21
star
39

CheXGPT

Python
17
star
40

learning-loss-for-tta

"Learning Loss for Test-Time Augmentation (NeurIPS 2020)"
Python
8
star
41

stg

Official implementation of Selective Token Generation (COLING'22)
Jupyter Notebook
8
star
42

leco

Official implementation of LECO (NeurIPS'22)
Python
5
star
43

bc-hyperopt-example

brain cloud hyperopt example (mnist)
Python
3
star