• Stars
    star
    126
  • Rank 282,938 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 7 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Customized KoNLPy - Korean Natural Language Processing Toolkit KoNLPy wrapping code

customized KoNLPy

ν•œκ΅­μ–΄ μžμ—°μ–΄μ²˜λ¦¬λ₯Ό ν•  수 μžˆλŠ” 파이썬 νŒ¨ν‚€μ§€, KoNLPy의 customized versionμž…λ‹ˆλ‹€.

customized_KoNLPyλŠ” ν™•μ‹€νžˆ μ•Œκ³  μžˆλŠ” 단어듀에 λŒ€ν•΄μ„œλŠ” 라이브러리λ₯Ό κ±°μΉ˜μ§€ μ•Šκ³  주어진 μ–΄μ ˆμ„ μ•„λŠ” λ‹¨μ–΄λ“€λ‘œ ν† ν¬λ‚˜μ΄μ§• / ν’ˆμ‚¬νŒλ³„μ„ ν•˜λŠ” κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄ template 기반 ν† ν¬λ‚˜μ΄μ§•μ„ μˆ˜ν–‰ν•©λ‹ˆλ‹€.

사전: {'μ•„μ΄μ˜€μ•„μ΄': 'Noun', 'λŠ”': 'Josa'}
νƒ¬ν”Œλ¦Ώ: Noun + Josa

μœ„μ™€ 같은 단어 λ¦¬μŠ€νŠΈμ™€ νƒ¬ν”Œλ¦Ώμ΄ μžˆλ‹€λ©΄ 'μ•„μ΄μ˜€μ•„μ΄λŠ”' μ΄λΌλŠ” μ–΄μ ˆμ€ [('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), ('λŠ”', 'Josa')]둜 λΆ„λ¦¬λ©λ‹ˆλ‹€.

Install

$ git clone https://github.com/lovit/customized_konlpy.git

$ pip install customized_konlpy

Requires

  • JPype >= 0.6.1
  • KoNLPy >= 0.4.4

Usage

Part of speech tagging

KoNLPy와 λ™μΌν•˜κ²Œ Twitter.pos(phrase)λ₯Ό μž…λ ₯ν•©λ‹ˆλ‹€. 각 μ–΄μ ˆλ³„λ‘œ μ‚¬μš©μž 사전에 μ•Œλ €μ§„ 단어가 μΈμ‹λ˜λ©΄ customized_tagger둜 μ–΄μ ˆμ„ λΆ„λ¦¬ν•˜λ©°, μ‚¬μš©μž 사전에 μ•Œλ €μ§€μ§€ μ•Šμ€ λ‹¨μ–΄λ‘œ κ΅¬μ„±λœ μ–΄μ ˆμ€ νŠΈμœ„ν„° ν˜•νƒœμ†Œ λΆ„μ„κΈ°λ‘œ μ²˜λ¦¬ν•©λ‹ˆλ‹€.

twitter.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” μ΄λ»μš”')
[('우리', 'Noun'), ('μ•„μ΄μ˜€', 'Noun'), ('아이', 'Noun'), ('λŠ”', 'Josa'), ('이뻐', 'Adjective'), ('μš”', 'Eomi')] 

'μ•„μ΄μ˜€μ•„μ΄'κ°€ μ•Œλ €μ§„ 단어가 μ•„λ‹ˆμ—ˆκΈ° λ•Œλ¬Έμ— νŠΈμœ„ν„° λΆ„μ„κΈ°μ—μ„œ 단어λ₯Ό μ œλŒ€λ‘œ μΈμ‹ν•˜μ§€ λͺ»ν•©λ‹ˆλ‹€. μ•„λž˜μ˜ μ‚¬μš©μž μ‚¬μ „μœΌλ‘œ 단어 μΆ”κ°€λ₯Ό ν•œ λ’€ λ™μΌν•œ μž‘μ—…μ„ μˆ˜ν–‰ν•˜λ©΄ μ•„λž˜μ™€ 같은 κ²°κ³Όλ₯Ό 얻을 수 μžˆμŠ΅λ‹ˆλ‹€.

twitter.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” μ΄λ»μš”')
[('우리', 'Modifier'), ('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), ('λŠ”', 'Josa'), ('이뻐', 'Adjective'), ('μš”', 'Eomi')]
twitter.pos('νŠΈμ™€μ΄μŠ€ttλŠ” μ’‹μ•„μš”')
[('νŠΈμ™€μ΄μŠ€', 'Noun'), ('tt', 'Noun'), ('λŠ”', 'Josa'), ('μ’‹', 'Adjective'), ('μ•„μš”', 'Eomi')]

Add words to dictioanry

ckonlpy.tag의 TwitterλŠ” add_dictionaryλ₯Ό ν†΅ν•˜μ—¬ str ν˜Ήμ€ list of str ν˜•μ‹μ˜ μ‚¬μš©μž 사전을 μΆ”κ°€ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

from ckonlpy.tag import Twitter

twitter.add_dictionary('μ•„μ΄μ˜€μ•„μ΄', 'Noun')
twitter.add_dictionary(['νŠΈμ™€μ΄μŠ€', 'tt'], 'Noun')

νŠΈμœ„ν„° ν•œκ΅­μ–΄ λΆ„μ„κΈ°μ—μ„œ μ΄μš©ν•˜μ§€ μ•ŠλŠ” ν’ˆμ‚¬ (단어 클래슀)λ₯Ό μΆ”κ°€ν•˜κ³  싢을 κ²½μš°μ—λŠ” λ°˜λ“œμ‹œ force=True둜 μ„€μ •ν•΄μ•Ό ν•©λ‹ˆλ‹€.

twitter.add_dictionary('lovit', 'Name', force=True)

Add template to customized tagger

ν˜„μž¬ μ‚¬μš©μ€‘μΈ νƒ¬ν”Œλ¦Ώ 기반 ν† ν¬λ‚˜μ΄μ €λŠ” μ½”λ“œ μ‚¬μš© 쀑 νƒ¬ν”Œλ¦Ώμ„ μΆ”κ°€ν•  수 μžˆμŠ΅λ‹ˆλ‹€. ν˜„μž¬ μ‚¬μš©μ€‘μΈ νƒ¬ν”Œλ¦Ώμ˜ λ¦¬μŠ€νŠΈλŠ” μ•„λž˜μ²˜λŸΌ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

twitter.template_tagger.templates
[('Noun', 'Josa'), ('Modifier', 'Noun'), ('Modifier', 'Noun', 'Josa')]

νƒ¬ν”Œλ¦Ώμ€ tuple of str ν˜•μ‹μœΌλ‘œ μž…λ ₯ν•©λ‹ˆλ‹€.

twitter.template_tagger.add_a_template(('Noun', 'Noun', 'Josa'))

Set templates tagger selector

Templatesλ₯Ό μ΄μš©ν•˜μ—¬λ„ 후보가 μ—¬λŸ¬ 개 λ‚˜μ˜¬ 수 μžˆμŠ΅λ‹ˆλ‹€. μ—¬λŸ¬ 개 후보 μ€‘μ—μ„œ best λ₯Ό μ„ νƒν•˜λŠ” ν•¨μˆ˜λ₯Ό 직접 λ””μžμΈ ν•  수 도 μžˆμŠ΅λ‹ˆλ‹€. 이처럼 λͺ‡ 개의 점수 기쀀을 λ§Œλ“€κ³ , 각 κΈ°μ€€μ˜ weightλ₯Ό λΆ€μ—¬ν•˜λŠ” 방식은 νŠΈμœ„ν„° λΆ„μ„κΈ°μ—μ„œ μ΄μš©ν•˜λŠ” 방식인데, 직관적이고 νŠœλ‹ κ°€λŠ₯ν•΄μ„œ 맀우 쒋은 방식이라 μƒκ°ν•©λ‹ˆλ‹€.

my_weights = [
    ('num_nouns', -0.1),
    ('num_words', -0.2),
    ('no_noun', -1),
    ('len_sum_of_nouns', 0.2)
]

def my_evaluate_function(candidate):
    num_nouns = len([word for word, pos, begin, e in candidate if pos == 'Noun'])
    num_words = len(candidate)
    has_no_nouns = (num_nouns == 0)
    len_sum_of_nouns = 0 if has_no_nouns else sum(
        (len(word) for word, pos, _, _ in candidate if pos == 'Noun'))

    scores = (num_nouns, num_words, has_no_nouns, len_sum_of_nouns)
    score = sum((score * weight for score, (_, weight) in zip(scores, my_weights)))
    return score

μœ„μ˜ 예제처럼 my_weights 와 my_evaluate_function ν•¨μˆ˜λ₯Ό μ •μ˜ν•˜μ—¬ twitter.set_evaluator()에 μž…λ ₯ν•˜λ©΄, ν•΄λ‹Ή ν•¨μˆ˜ κΈ°μ€€μœΌλ‘œ best candidateλ₯Ό μ„ νƒν•©λ‹ˆλ‹€.

twitter.set_evaluator(my_weights, my_evaluate_function)

Postprocessor

passwords, stopwords, passtags, 단어 μΉ˜ν™˜μ„ μœ„ν•œ ν›„μ²˜λ¦¬λ₯Ό ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

passwords 에 λ“±λ‘λœ 단어, (단어, ν’ˆμ‚¬)만 좜λ ₯λ©λ‹ˆλ‹€.

from ckonlpy.tag import Postprocessor

passwords = {'μ•„μ΄μ˜€μ•„μ΄', ('정말', 'Noun')}
postprocessor = Postprocessor(twitter, passwords = passwords)
postprocessor.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” 정말 μ΄λ»μš”')
# [('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), ('정말', 'Noun')]

stopwords 에 λ“±λ‘λœ 단어, (단어, ν’ˆμ‚¬)λŠ” 좜λ ₯λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

stopwords = {'λŠ”'}
postprocessor = Postprocessor(twitter, stopwords = stopwords)
postprocessor.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” 정말 μ΄λ»μš”')
# [('우리', 'Modifier'), ('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), ('정말', 'Noun'), ('이뻐', 'Adjective'), ('μš”', 'Eomi')]

νŠΉμ • ν’ˆμ‚¬λ₯Ό μ§€μ •ν•˜λ©΄, ν•΄λ‹Ή ν’ˆμ‚¬λ§Œ 좜λ ₯λ©λ‹ˆλ‹€.

passtags = {'Noun'}
postprocessor = Postprocessor(twitter, passtags = passtags)
postprocessor.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” 정말 μ΄λ»μš”')
# [('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), ('정말', 'Noun')]

μΉ˜ν™˜ν•  단어, (단어, ν’ˆμ‚¬)λ₯Ό dict ν˜•μ‹μœΌλ‘œ μ •μ˜ν•˜λ©΄ tag μ—μ„œ 단어가 μΉ˜ν™˜λ˜μ–΄ 좜λ ₯λ©λ‹ˆλ‹€.

replace = {'μ•„μ΄μ˜€μ•„μ΄': 'μ•„μ΄λŒ', ('이뻐', 'Adjective'): 'μ˜ˆμ˜λ‹€'}
postprocessor = Postprocessor(twitter, replace = replace)
postprocessor.pos('μš°λ¦¬μ•„μ΄μ˜€μ•„μ΄λŠ” 정말 μ΄λ»μš”')
# [('우리', 'Modifier'), ('μ•„μ΄λŒ', 'Noun'), ('λŠ”', 'Josa'), ('정말', 'Noun'), ('μ˜ˆμ˜λ‹€', 'Adjective'), ('μš”', 'Eomi')]

μ—°μ†λœ 단어λ₯Ό ν•˜λ‚˜μ˜ 단어루 λ¬ΆκΈ° μœ„ν•΄μ„œ nested tuple μ΄λ‚˜ tuple of str ν˜•μ‹μ˜ ngram 을 μž…λ ₯ν•  수 μžˆμŠ΅λ‹ˆλ‹€. tuple of str 의 ν˜•μ‹μœΌλ‘œ μž…λ ₯된 ngram 은 Noun 으둜 μΈμ‹λ©λ‹ˆλ‹€.

ngrams = [(('미슀', '함무라비'), 'Noun'), ('λ°”λžŒ', '의', 'λ‚˜λΌ')]
postprocessor = Postprocessor(twitter, ngrams = ngrams)
postprocessor.pos('미슀 ν•¨λ¬΄λΌλΉ„λŠ” μž¬λ°ŒλŠ” λ“œλΌλ§ˆμž…λ‹ˆλ‹€')
# [('미슀 - 함무라비', 'Noun'), ('λŠ”', 'Josa'), ('μž¬λ°ŒλŠ”', 'Adjective'), ('λ“œλΌλ§ˆ', 'Noun'), ('μž…λ‹ˆ', 'Adjective'), ('λ‹€', 'Eomi')]

Loading wordset

utils μ—λŠ” stopwords, passwords, replace word pair λ₯Ό 파일둜 μ €μž₯ν•˜μ˜€μ„ 경우, 이λ₯Ό μ†μ‰½κ²Œ λΆˆλŸ¬μ˜€λŠ” ν•¨μˆ˜κ°€ μžˆμŠ΅λ‹ˆλ‹€.

load_wordset 은 set of str ν˜Ήμ€ set of tuple 을 return ν•©λ‹ˆλ‹€. μ˜ˆμ‹œμ˜ passwords.txt 의 λ‚΄μš©μ€ μ•„λž˜μ™€ κ°™μŠ΅λ‹ˆλ‹€. λ‹¨μ–΄μ˜ ν’ˆμ‚¬λŠ” ν•œ μΉΈ λ„μ–΄μ“°κΈ°λ‘œ κ΅¬λΆ„ν•©λ‹ˆλ‹€. stopwords.txt 도 λ™μΌν•œ ν¬λ©§μž…λ‹ˆλ‹€.

μ•„μ΄μ˜€μ•„μ΄
μ•„μ΄μ˜€μ•„μ΄ Noun
곡연

load_wordset 을 μ΄μš©ν•˜λŠ” μ˜ˆμ‹œμ½”λ“œ μž…λ‹ˆλ‹€.

from ckonlpy.utils import load_wordset

passwords = load_wordset('./passwords.txt')
print(passwords) # {('μ•„μ΄μ˜€μ•„μ΄', 'Noun'), 'μ•„μ΄μ˜€μ•„μ΄', '곡연'}

stopwords = load_wordset('./stopwords.txt')
print(stopwords) # {'은', 'λŠ”', ('이', 'Josa')}

μΉ˜ν™˜ν•  λ‹¨μ–΄μŒμ€ tap ꡬ뢄이 λ˜μ–΄μžˆμŠ΅λ‹ˆλ‹€. μΉ˜ν™˜λ  단어에 ν’ˆμ‚¬ νƒœκ·Έκ°€ μžˆμ„ 경우 ν•œ μΉΈ λ„μ–΄μ“°κΈ°λ‘œ κ΅¬λΆ„ν•©λ‹ˆλ‹€.

str\tstr
str str\tstr

μ•„λž˜λŠ” replacewords.txt 의 μ˜ˆμ‹œμž…λ‹ˆλ‹€.

μ•„λΉ 	아버지
μ—„λ§ˆ Noun	μ–΄λ¨Έλ‹ˆ

load_replace_wordpair 을 μ΄μš©ν•˜λŠ” μ˜ˆμ‹œμ½”λ“œ μž…λ‹ˆλ‹€.

from ckonlpy.utils import load_replace_wordpair

replace = load_replace_wordpair('./replacewords.txt')
print(replace) # {'μ•„λΉ ': '아버지', ('μ—„λ§ˆ', 'Noun'): 'μ–΄λ¨Έλ‹ˆ'}

ngram λ‹¨μ–΄λ“€μ˜ 각 λ‹¨μ–΄λŠ” ν•œ μΉΈ λ„μ–΄μ“°κΈ°λ‘œ, ngram 의 ν’ˆμ‚¬λŠ” tap 으둜 κ΅¬λΆ„λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.

str str
str str\tstr

μ•„λž˜λŠ” ngrams.txt 의 μ˜ˆμ‹œμž…λ‹ˆλ‹€.

λ°”λžŒ 의 λ‚˜λΌ
미슀 함무라비	Noun

load_ngram 을 μ΄μš©ν•˜λŠ” μ˜ˆμ‹œμ½”λ“œ μž…λ‹ˆλ‹€.

from ckonlpy.utils import load_ngram

ngrams = load_ngram('./ngrams.txt')
print(ngrams) # [('λ°”λžŒ', '의', 'λ‚˜λΌ'), (('미슀', '함무라비'), 'Noun')]

0.0.6+ vs 0.0.5x

0.0.5x μ—μ„œμ˜ λ³€μˆ˜μ™€ ν•¨μˆ˜μ˜ 이름, λ³€μˆ˜μ˜ νƒ€μž… 일뢀λ₯Ό λ³€κ²½ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

λ³€κ²½ μ „ λ³€κ²½ ν›„
ckonlpy.tag.Twitter._loaded_twitter_default_dictionary ckonlpy.tag.Twitter.use_twitter_dictionary
ckonlpy.tag.Twitter._dictionary ckonlpy.tag.Twitter.dictionary
ckonlpy.tag.Twitter._customized_tagger ckonlpy.tag.Twitter.template_tagger
ckonlpy.tag.Postprocessor.tag ckonlpy.tag.Postprocessor.pos
ckonlpy.custom_tag.SimpleSelector ckonlpy.custom_tag.SimpleEvalator
ckonlpy.custom_tag.SimpleSelector.score ckonlpy.custom_tag.SimpleEvalator.evaluate
ckonlpy.tag.Twitter.set_selector ckonlpy.tag.AbstractTagger.set_evaluator
ckonlpy.custom_tag.SimpleSelector.weight ckonlpy.custom_tag.SimpleEvaluator.weight
λ³€κ²½ ν›„ λ³€κ²½ 이유
ckonlpy.tag.Twitter.use_twitter_dictionary konlpy.tag.Twitter 의 사전 μ‚¬μš© 유무
ckonlpy.tag.Twitter.dictionary public 으둜 λ³€ν™˜ν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.tag.Twitter.template_tagger Template 기반으둜 μž‘λ™ν•˜λŠ” tagger μž„μ„ λͺ…μ‹œν•˜κ³ , public 으둜 λ³€ν™˜ν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.tag.Postprocessor.pos κΈ°λ³Έ tagger 의 κ²°κ³Όλ₯Ό ν›„μ²˜λ¦¬ν•˜λŠ” κΈ°λŠ₯이기 λ•Œλ¬Έμ— λ™μΌν•œ ν•¨μˆ˜λͺ…μœΌλ‘œ ν†΅μΌν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.custom_tag.SimpleEvalator 클래슀 이름을 Selector μ—μ„œ Evaluator 둜 λ³€κ²½ν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.custom_tag.SimpleEvalator.evaluate ν’ˆμ‚¬μ—΄ ν›„λ³΄μ˜ 점수 계산 뢀뢄을 score --> evaluate 둜 ν•¨μˆ˜λͺ…을 λ³€κ²½ν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.tag.AbstractTagger.set_evaluator ν’ˆμ‚¬μ—΄ ν›„λ³΄μ˜ 점수 계산 ν•¨μˆ˜λ₯Ό μ„€μ •ν•˜λŠ” ν•¨μˆ˜μ˜ 이름을 λ³€κ²½ν•˜μ˜€μŠ΅λ‹ˆλ‹€. ν•΄λ‹Ή ν•¨μˆ˜λŠ” ckonlpy.tag.Twitter μ—μ„œ ckonlpy.tag.AbstractTagger 둜 μ΄λ™ν•˜μ˜€μŠ΅λ‹ˆλ‹€
ckonlpy.custom_tag.SimpleEvaluator.weight {str:float} ν˜•μ‹μ˜ weight λ₯Ό [(str, float)] ν˜•μ‹μœΌλ‘œ λ³€κ²½ν•˜μ˜€μŠ΅λ‹ˆλ‹€

More Repositories

1

soynlp

ν•œκ΅­μ–΄ μžμ—°μ–΄μ²˜λ¦¬λ₯Ό μœ„ν•œ 파이썬 λΌμ΄λΈŒλŸ¬λ¦¬μž…λ‹ˆλ‹€. 단어 μΆ”μΆœ/ ν† ν¬λ‚˜μ΄μ € / ν’ˆμ‚¬νŒλ³„/ μ „μ²˜λ¦¬μ˜ κΈ°λŠ₯을 μ œκ³΅ν•©λ‹ˆλ‹€.
Python
933
star
2

KR-WordRank

λΉ„μ§€λ„ν•™μŠ΅ λ°©λ²•μœΌλ‘œ ν•œκ΅­μ–΄ ν…μŠ€νŠΈμ—μ„œ 단어/ν‚€μ›Œλ“œλ₯Ό μžλ™μœΌλ‘œ μΆ”μΆœν•˜λŠ” λΌμ΄λΈŒλŸ¬λ¦¬μž…λ‹ˆλ‹€
Python
351
star
3

textmining-tutorial

(ν•œκ΅­μ–΄) ν…μŠ€νŠΈ λ§ˆμ΄λ‹μ„ μœ„ν•œ 곡뢀거리듀
Jupyter Notebook
204
star
4

soyspacing

띄어쓰기 였λ₯˜ ꡐ정 λΌμ΄λΈŒλŸ¬λ¦¬μž…λ‹ˆλ‹€. CRF 와 같은 λ¨Έμ‹ λŸ¬λ‹ μ•Œκ³ λ¦¬μ¦˜μ΄ μ•„λ‹Œ, 직관적인 μ ‘κ·Όλ²•μœΌλ‘œ 띄어쓰기λ₯Ό κ΅μ •ν•©λ‹ˆλ‹€.
Python
145
star
5

textrank

Implementation TextRank and related utils
Python
84
star
6

KoBERTScore

BERTScore for Korean
Python
72
star
7

fastcampus_textml_blogs

패슀트캠퍼슀, μžμ—°μ–΄μ²˜λ¦¬λ₯Ό μœ„ν•œ λ¨Έμ‹ λŸ¬λ‹, μˆ˜μ—…κ΄€λ ¨ 포슀트 μž…λ‹ˆλ‹€.
70
star
8

huggingface_konlpy

Training Transformers of Huggingface with KoNLPy
Jupyter Notebook
68
star
9

WordPieceModel

Word Piece Model python light version with functions tokenize/save/load
Python
66
star
10

namuwikitext

Wikitext format dataset of Namuwiki (Most famous Korean wikipedia)
Python
50
star
11

soy

Python
50
star
12

naver_news_search_scraper

검색어 κΈ°μ€€μœΌλ‘œ λ„€μ΄λ²„λ‰΄μŠ€μ™€ λŒ“κΈ€μ„ μˆ˜μ§‘ν•˜λŠ” 파이썬 μ½”λ“œ
Python
43
star
13

korean_lemmatizer

ν•œκ΅­μ–΄ μš©μ–Έ 뢄석기 (μ›ν˜• 볡원, μš©μ–Έ ν˜•νƒœμ†Œ 뢄석)
Python
41
star
14

python_ml4nlp

패슀트캠퍼슀 μžμ—°μ–΄μ²˜λ¦¬λ₯Ό μœ„ν•œ λ¨Έμ‹ λŸ¬λ‹ μ‹€μŠ΅ μžλ£Œμ‹€
Jupyter Notebook
40
star
15

soykeyword

Python library for keyword extraction
Python
39
star
16

textmining_dataset

ν…μŠ€νŠΈλ§ˆμ΄λ‹ μ‹€μŠ΅μ„ μœ„ν•œ 데이터셋 ν•Έλ“€λŸ¬
Python
38
star
17

clustering4docs

Clustering algorithm library. Implemented spherical kmeans
Python
37
star
18

sejong_corpus_cleaner

μ„Έμ’… λ§λ­‰μΉ˜ 데이터λ₯Ό μ •μ œν•˜κΈ° μœ„ν•œ utils
Python
36
star
19

naver_movie_scraper

넀이버 μ˜ν™” 정보 및 μ‚¬μš©μž μž‘μ„± μ˜ν™”ν‰/평점 데이터 μˆ˜μ§‘κΈ°
Python
29
star
20

kmrd

Synthetic dataset for recommender system created from Naver Movie rating system
Python
24
star
21

levenshtein_finder

Similar string search in Levenshtein distance
Python
22
star
22

python_ml_intro

패슀트캠퍼슀, νŒŒμ΄μ¬μ„ μ΄μš©ν•œ λ¨Έμ‹ λŸ¬λ‹ μž…λ¬Έ μ‹€μŠ΅ μ½”λ“œ
Jupyter Notebook
21
star
23

python_ml4tm

패슀트캠퍼슀 ν…μŠ€νŠΈλ§ˆμ΄λ‹μ„ μœ„ν•œ λ¨Έμ‹ λŸ¬λ‹ μ‹€μŠ΅ μžλ£Œμ‹€
Jupyter Notebook
20
star
24

kowikitext

Python
19
star
25

petitions_dataset

μ²­μ™€λŒ€ ꡭ민청원 κ²Œμ‹œνŒμœΌλ‘œλΆ€ν„° μˆ˜μ§‘λœ 데이터
Python
17
star
26

synthetic_dataset

Synthetic data generator for machine learning
Python
16
star
27

petitions_archive

μ²­μ™€λŒ€ ꡭ민청원 데이터 μ•„μΉ΄μ΄λΈŒ
15
star
28

petitions_scraper

μ²­μ™€λŒ€ ꡭ민청원 κ²Œμ‹œνŒμ˜ 데이터λ₯Ό μˆ˜μ§‘ν•˜λŠ” 슀크래퍼
Python
15
star
29

pycrfsuite_spacing

python-crfsuiteλ₯Ό μ΄μš©ν•œ ν•œκ΅­μ–΄ 띄어쓰기 ꡐ정기
Python
14
star
30

sejong_corpus

μ„Έμ’…λ§λ­‰μΉ˜ 가곡데이터 Repository
Jupyter Notebook
13
star
31

crf_postagger

Korean Part-of-Speech Tagger using Conditional Random Field (CRF)
Python
12
star
32

kmeans_to_pyLDAvis

Visualizing k-means using pyLDAvis
Python
11
star
33

komoran3py

Komoran 3 in Python
Python
11
star
34

hmm_postagger

Korean Morphological Analyzer using Hidden Markov Model (HMM)
Python
10
star
35

flask_api_tutorial

Flask 둜 API λ₯Ό λ§Œλ“€κΈ° μœ„ν•œ νŠœν† λ¦¬μ–Ό
Python
10
star
36

kmeans_ensemble

Python k-means ensemble package & tutorials
Python
9
star
37

text_embedding

Inferring vector of unseen words
Python
7
star
38

archive_carblog_analysis

Carblog dataset (github.com/lovit/carblog_dataset) 의 뢄석 μ½”λ“œμž…λ‹ˆλ‹€
Python
6
star
39

joint_visualization_of_words_and_docs

(Demo) Joint visualization for representation of words and docs trained from Doc2Vec
Python
6
star
40

ppomppu_scraper

λ½λΏŒκ²Œμ‹œνŒ λ³Έλ¬Έ, 제λͺ©, 슀크래퍼
Python
6
star
41

text-dedup

Python package for memory-friendly text de-duplication
Python
6
star
42

open-review2

ꡬ관이 λͺ…관인 λ°μ΄ν„°λ§ˆμ΄λ‹ μ•Œκ³ λ¦¬μ¦˜λ“€
5
star
43

pagerank

PageRank
Jupyter Notebook
5
star
44

topic_embedding

Embedding words to topic space
Python
5
star
45

ekmeans

Epsilon constrained k-means for document clustering with noise removal
Python
5
star
46

sharing_korean_dictionary

λ‹€μ–‘ν•œ λΆ„μ•Όμ˜ ν•œκ΅­μ–΄ part of speech tagging / named entity recognition 용 사전을 κ³΅μœ ν•˜κΈ° μœ„ν•œrepositoryμž…λ‹ˆλ‹€
Python
4
star
47

rnnspace

Space Correction using Character-level Recurrent Neural Network (RNN, LSTM, GRU, etc)
Python
4
star
48

lovit.github.io

HTML
4
star
49

washingtonpost_scraper

Washington Post Search Scraper
Python
3
star
50

soygraph

Graph similarity & ranking algorithms
Python
3
star
51

archive_clustering_visualization

Visualize clustering result
Jupyter Notebook
3
star
52

korean-wikis-handler

ν•œκ΅­μ–΄ μœ„ν‚€ν”Όλ””μ•„, λ‚˜λ¬΄μœ„ν‚€ 데이터 핸듀링
Jupyter Notebook
3
star
53

python_upload_webserver

Flask, Waitress based file upload webserver
Python
3
star
54

sec.gov_scrapper

Scrapping code for www.sec.gov
Jupyter Notebook
2
star
55

ie_openseminar_1_from_text_to_doc2vec_tsne

Openseminar #1 From scraping to Word2vec, Doc2Vec visualization with t-SNE
Jupyter Notebook
2
star
56

fastcosine

Approximiated nearest neighbor search for sparse vector
Python
2
star
57

s3-log-parser

AWS S3 access log parser
Python
2
star
58

korean_autumn_hmm

"ν•œκ΅­μ˜ λ΄„ 가을은 짧아지고 μžˆλŠ”κ°€? κΉ€λ™ν˜„, μ‹ ν•˜μš©, λŒ€ν•œμ‚°μ—…κ³΅ν•™νšŒμ§€ 2013" λ…Όλ¬Έμ˜ μž¬ν˜„
2
star
59

latex_sample

Latex 으둜 λ¬Έμ„œ μž‘μ—…μ„ ν•˜κ³ , git 으둜 버전관리λ₯Ό ν•˜λŠ” 것을 μ„€λͺ…ν•˜κΈ° μœ„ν•œ sample repository μž…λ‹ˆλ‹€.
TeX
1
star
60

python-stopwatch

Python stopwatch
Python
1
star
61

reddit_scraper

Reddit scraper. Get latest posts from Reddit
Python
1
star
62

simple_ner

Simple NER Extraction
Jupyter Notebook
1
star
63

bag-of-concepts

Python
1
star
64

lda_significance_rank

LDA λͺ¨λΈμ˜ junk topic, words 탐색기
Python
1
star
65

crs_downloader

Python
1
star
66

wilsoncenter_scraper

Wilsoncenter web page scraper
Python
1
star
67

s3log_monitor

S3 log monitor
Python
1
star
68

network_based_nearest_neighbors

Network-based Nearest Neighbor Indexer
Python
1
star
69

imdb_scraper

Python
1
star
70

easy_wikitext

Wikitext dataset handler
Python
1
star
71

google_scholar_citation_keywords

Google scholar citation keyword
Jupyter Notebook
1
star
72

archive_acl2019review

Python
1
star
73

wsj_scraper

Scrapping thumbnails of search result in WSJ
Python
1
star