• Stars
    star
    147
  • Rank 251,347 (Top 5 %)
  • Language
    Python
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sentence Embeddings using Siamese ETRI KoBERT-Networks

Ko-Sentence-BERT

Korean SentenceBERT : Sentence Embeddings using Siamese ETRI KoBERT-Networks

Note
๋‹ค์–‘ํ•œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ๋ฐ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.
[Sentence-Embedding-Is-All-You-Need]

Installation

  • ETRI KorBERT๋Š” transformers 2.4.1 ~ 2.8.0์—์„œ๋งŒ ๋™์ž‘ํ•˜๊ณ  Sentence-BERT๋Š” 3.1.0 ๋ฒ„์ „ ์ด์ƒ์—์„œ ๋™์ž‘ํ•˜์—ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ˆ˜์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • huggingface transformer, sentence transformers, tokenizers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ฝ”๋“œ๋ฅผ ์ง์ ‘ ์ˆ˜์ •ํ•˜๋ฏ€๋กœ ๊ฐ€์ƒํ™˜๊ฒฝ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ์‚ฌ์šฉํ•œ Docker image๋Š” Docker Hub์— ์ฒจ๋ถ€ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI KoBERT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€๊ณ  ๋ณธ ๋ ˆํŒŒ์ง€ํ† ๋ฆฌ์—์„  ETRI KoBERT๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • SKT KoBERT๋ฅผ ์‚ฌ์šฉํ•œ ๋ฒ„์ „์€ ๋‹ค์Œ ๋ ˆํŒŒ์ง€ํ† ๋ฆฌ์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
git clone https://github.com/BM-K/KoSentenceBERT.git
python -m venv .KoSBERT
. .KoSBERT/bin/activate
pip install -r requirements.txt
  • transformer, tokenizers, sentence_transformers ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ .KoSBERT/lib/python3.7/site-packages/ ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI_KoBERT ๋ชจ๋ธ๊ณผ tokenizer๊ฐ€ KoSentenceBERT ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— ์กด์žฌํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ETRI ๋ชจ๋ธ๊ณผ tokenizer๋Š” ๋‹ค์Œ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค :
from ETRI_tok.tokenization_etri_eojeol import BertTokenizer
self.auto_model = BertModel.from_pretrained('./ETRI_KoBERT/003_bert_eojeol_pytorch') 
self.tokenizer = BertTokenizer.from_pretrained('./ETRI_KoBERT/003_bert_eojeol_pytorch/vocab.txt', do_lower_case=False)

Train Models

  • ๋ชจ๋ธ ํ•™์Šต์„ ์›ํ•˜์‹œ๋ฉด KoSentenceBERT ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— KorNLUDatasets์ด ์กด์žฌํ•˜์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • STS ํ•™์Šต ์‹œ ๋ชจ๋ธ ๊ตฌ์กฐ์— ๋งž๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๋ฐ์ดํ„ฐ์™€ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค :

    KoSentenceBERT/KorNLUDatasets/KorSTS/tune_test.tsv

    STS test ๋ฐ์ดํ„ฐ์…‹์˜ ์ผ๋ถ€
python training_nli.py      # NLI ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต
python training_sts.py      # STS ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šต
python con_training_sts.py  # NLI ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต ํ›„ STS ๋ฐ์ดํ„ฐ๋กœ Fine-Tuning

Pre-Trained Models

pooling mode๋Š” MEAN-strategy๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ํ•™์Šต์‹œ ๋ชจ๋ธ์€ output ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅ ๋ฉ๋‹ˆ๋‹ค.

๋””๋ ‰ํ† ๋ฆฌ ํ•™์Šต๋ฐฉ๋ฒ•
training_nli_ETRI_KoBERT-003_bert_eojeol Only Train NLI
training_sts_ETRI_KoBERT-003_bert_eojeol Only Train STS
training_nli_sts_ETRI_KoBERT-003_bert_eojeol STS + NLI

Performance

  • Seed ๊ณ ์ •, test set
Model Cosine Pearson Cosine Spearman Euclidean Pearson Euclidean Spearman Manhattan Pearson Manhattan Spearman Dot Pearson Dot Spearman
NLl 67.96 70.45 71.06 70.48 71.17 70.51 64.87 63.04
STS 80.43 79.99 78.18 78.03 78.13 77.99 73.73 73.40
STS + NLI 80.10 80.42 79.14 79.28 79.08 79.22 74.46 74.16
  • Performance comparison with other models [KLUE-PLMs].

Application Examples

  • ์ƒ์„ฑ ๋œ ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ์„ ๋‹ค์šด ์ŠคํŠธ๋ฆผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
  • STS + NLI pretrained ๋ชจ๋ธ์„ ํ†ตํ•ด ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Semantic Search

SemanticSearch.py๋Š” ์ฃผ์–ด์ง„ ๋ฌธ์žฅ๊ณผ ์œ ์‚ฌํ•œ ๋ฌธ์žฅ์„ ์ฐพ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
๋จผ์ € Corpus์˜ ๋ชจ๋“  ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

from sentence_transformers import SentenceTransformer, util
import numpy as np

model_path = './output/training_nli_sts_ETRI_KoBERT-003_bert_eojeol'

embedder = SentenceTransformer(model_path)

# Corpus with example sentences
corpus = ['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.',
          '๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.',
          'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.',
          '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.']

corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)

# Query sentences:
queries = ['ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.',
           '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.',
           '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity
top_k = 5
for query in queries:
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    cos_scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)[0]
    cos_scores = cos_scores.cpu()

    #We use np.argpartition, to only partially sort the top_k results
    top_results = np.argpartition(-cos_scores, range(top_k))[0:top_k]

    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")

    for idx in top_results[0:top_k]:
        print(corpus[idx].strip(), "(Score: %.4f)" % (cos_scores[idx]))
        


๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค :

========================


Query: ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.

Top 5 most similar sentences in corpus:
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.7557)
ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค. (Score: 0.6464)
ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค. (Score: 0.2565)
ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค. (Score: 0.2333)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.1792)


========================


Query: ๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.

Top 5 most similar sentences in corpus:
์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค. (Score: 0.6732)
์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค. (Score: 0.3401)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.1037)
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.0617)
๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค. (Score: 0.0466)


=======================


Query: ์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.

Top 5 most similar sentences in corpus:
์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค. (Score: 0.7164)
๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค. (Score: 0.3216)
์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค. (Score: 0.2071)
ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค. (Score: 0.1089)
ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค. (Score: 0.0724)

Clustering

Clustering.py๋Š” ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ์œ ์‚ฌ์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์œ ์‚ฌํ•œ ๋ฌธ์žฅ์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋Š” ์˜ˆ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์ด์ „๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋จผ์ € ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

from sentence_transformers import SentenceTransformer, util
import numpy as np

model_path = './output/training_nli_sts_ETRI_KoBERT-003_bert_eojeol'

embedder = SentenceTransformer(model_path)

# Corpus with example sentences
corpus = ['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.',
          '๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.',
          'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.',
          '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.',
          '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.',
          'ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.',
          '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.',
          '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

corpus_embeddings = embedder.encode(corpus)

# Then, we perform k-means clustering using sklearn:
from sklearn.cluster import KMeans

num_clusters = 5
clustering_model = KMeans(n_clusters=num_clusters)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_

clustered_sentences = [[] for i in range(num_clusters)]
for sentence_id, cluster_id in enumerate(cluster_assignment):
    clustered_sentences[cluster_id].append(corpus[sentence_id])

for i, cluster in enumerate(clustered_sentences):
    print("Cluster ", i+1)
    print(cluster)
    print("")

๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค :

Cluster  1
['๋‘ ๋‚จ์ž๊ฐ€ ์ˆ˜๋ ˆ๋ฅผ ์ˆฒ ์†์œผ๋กœ ๋ฐ€์—ˆ๋‹ค.', '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.', '์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.']

Cluster  2
['ํ•œ ๋‚จ์ž๊ฐ€ ๋ง์„ ํƒ„๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ๋‹ด์œผ๋กœ ์‹ธ์ธ ๋•…์—์„œ ๋ฐฑ๋งˆ๋ฅผ ํƒ€๊ณ  ์žˆ๋‹ค.']

Cluster  3
['ํ•œ ๋‚จ์ž๊ฐ€ ์Œ์‹์„ ๋จน๋Š”๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ๋นต ํ•œ ์กฐ๊ฐ์„ ๋จน๋Š”๋‹ค.', 'ํ•œ ๋‚จ์ž๊ฐ€ ํŒŒ์Šคํƒ€๋ฅผ ๋จน๋Š”๋‹ค.']

Cluster  4
['๊ทธ ์—ฌ์ž๊ฐ€ ์•„์ด๋ฅผ ๋Œ๋ณธ๋‹ค.', 'ํ•œ ์—ฌ์ž๊ฐ€ ๋ฐ”์ด์˜ฌ๋ฆฐ์„ ์—ฐ์ฃผํ•œ๋‹ค.']

Cluster  5
['์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.', '๊ณ ๋ฆด๋ผ ์˜์ƒ์„ ์ž…์€ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค.']

Downstream Tasks Demo




Citing

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
}
@article{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    journal= "arXiv preprint arXiv:2004.09813",
    month = "04",
    year = "2020",
    url = "http://arxiv.org/abs/2004.09813",
}
@article{ham2020kornli,
  title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding},
  author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon},
  journal={arXiv preprint arXiv:2004.03289},
  year={2020}
}

More Repositories

1

Sentence-Embedding-Is-All-You-Need

Korean Sentence Embedding Repository
Python
105
star
2

KoSentenceBERT-SKT

Sentence Embeddings using Siamese SKT KoBERT-Networks
Python
99
star
3

KoSimCSE-SKT

Simple Contrastive Learning of Korean Sentence Embeddings
Python
41
star
4

Styling-Chatbot-with-Transformer

Language Style๊ณผ ๊ฐ์ •์— ๋”ฐ๋ฅธ ์ฑ—๋ด‡ ๋‹ต๋ณ€ ๋ณ€ํ™” ๋ชจ๋ธ
Python
32
star
5

KoMiniLM

Korean Light Weight Language Model
Python
29
star
6

KoDiffCSE

Difference-based Contrastive Learning for Korean Sentence Embeddings
Python
20
star
7

Dialogue-Generation-BERT-GPT2-Korean

Python
16
star
8

Troll-Detector

Troll Detector
Python
14
star
9

KoBART-summarization-pytorch

๐Ÿง€ KoBART summarization using pytorch
Python
12
star
10

KoChatBART

Korean Chatting BART
Jupyter Notebook
11
star
11

Analyzing-Product-Review-System-with-BERT

Python
10
star
12

Dialogue-Generation-Model-Evaluation

Automatic Evaluation Code for Measuring Dialogue Generation Model Performance
Python
9
star
13

KoSentenceBERT_V2

KoSentenceBERT ๋ชจ๋ธ ๊ตฌ์กฐ ๋ณ€๊ฒฝ์œผ๋กœ ์„ฑ๋Šฅ ํ–ฅ์ƒ
CSS
9
star
14

Dialogue-Generation-BERT-GPT2-English

Python
8
star
15

-Personal-study-Deep_Mutual_Learning

Python
8
star
16

Paper-Seminar

Paper Seminar
7
star
17

Knowledge-Distillation-Experiments

Python
7
star
18

KoIR

Korean Information Retrieval
Python
7
star
19

Transformer-Implementation

Python
6
star
20

TF-IDF-with-ArchDaily

Python
6
star
21

-Personal-study-s2s-transformer

-Personal-study-transformer-classification
Python
5
star
22

Question-Difficulty-Estimation

Question Difficulty Estimation
Python
5
star
23

WiseReporter

Python
5
star
24

Simple-NER-Implementation

ํ•œ๊ตญ์–ด ๊ฐœ์ฒด๋ช…์ธ์‹๊ธฐ (BERT based Named Entity Recognition model for Korean)
Python
5
star
25

algorithm

algorithm
Python
4
star
26

distinct-N

Python
4
star
27

-Personal-study-prac-kor-embedding

Python
4
star
28

T-SSKD

Python
4
star
29

My_web

for study
Python
3
star
30

BertForMaskedLM-Performance

BERT MLM ์„ฑ๋Šฅ ์ฒดํฌ
Python
3
star
31

CBCL

Continual Learning
Python
3
star
32

Vocab

Python
3
star
33

Retrieve-and-Refine

3
star
34

Response-Aware-Hybrid-Response-Generator

Python
2
star
35

Response-Aware-Candidate-Retrieval

Code for the IP&M "A Hybrid Response Generation by Response-Aware Candidate Retrieval and Seq-to-seq Generation"
Python
2
star
36

BM-K

2
star
37

CoNKT

Contrastive Neural Korean Text Generation
1
star