• Stars
    star
    657
  • Rank 68,589 (Top 2 %)
  • Language
  • Created about 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A curated list of resources for NLP (Natural Language Processing) for Korean

Awesome-Korean-NLP

A curated list of Natural Language Processing (NLP) of

  • NLP of Korean Text
  • NLP information written in Korean.

Feel free to contribute! or blab it here

Maintainer: Jaemin Cho

Index

  1. Tools
  2. Dataset
  3. Blogs / Slides / Researchers
  4. Papers
  5. Lectures
  6. Journals / Conferences / Institutes / Events
  7. Online Communities
  8. How to contribute

1. Tools

(Korean-specific tools are listed ahead of language-agnostic tools.)

1.1. Morpheme/ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ + Part of Speech(PoS)/ํ’ˆ์‚ฌ Tagger

  • Hannanum (ํ•œ๋‚˜๋ˆ”) (Java, C) [link]
    • KoNLPy (Python) [link]
  • Kkma (๊ผฌ๊ผฌ๋งˆ) (Java) [link] [paper]
    • KoNLPy (Python) [link]
  • Komoran (Java) [link]
    • KoNLPy (Python) [link]
  • Mecab-ko (C++) [link]
    • KoNLPy (Python) [link]
  • Twitter (Scala, Java) [link]
    • KoNLPy (Python) [link]
    • .NET, Node.js, Python, Ruby, Elasitc Search bindings
  • dparser (REST API) [link]
  • UTagger [link]
  • Arirang (Lucence, Java) [link]
  • Rouzeta [link] [slide] [video]
  • seunjeon (Scala, Java) [link]
  • RHINO (๋ผ์ด๋…ธ)ย [link]
  • KTS [paper]
  • ๊นœ์ง์ƒˆ [link]

1.2. Named Entity(NE) Tagger / ๊ฐœ์ฒด๋ช… ์ธ์‹๊ธฐ

1.3. Spell Checker / ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ๊ธฐ

  • PNU Spell Checker [link]
  • Naver Spell Checker [link]
  • Daum Spell Checker [link]
  • hunspell-ko [link]

1.4. Syntax Parser / ๊ตฌ๋ฌธ ๋ถ„์„๊ธฐ

  • dparser (REST API) [link]
  • NLP HUB (Java) [link]

1.5. Sentimental Analysis / ๊ฐ์ • ๋ถ„์„๊ธฐ

  • OpenHangul (์˜คํ”ˆํ•œ๊ธ€) [link] [paper]

1.6. Translator / ๋ฒˆ์—ญ๊ธฐ

1.7. Packages

1.8. Others / ๊ธฐํƒ€

  • Hangulpy (Python) [link]
    • ์ž๋™ ์กฐ์‚ฌ/์ ‘๋ฏธ์‚ฌ ์ฒจ๋ถ€, ์ž๋ชจ ๋ถ„ํ•ด ๋ฐ ๊ฒฐํ•ฉ
  • Hangulize (Python) [link]
    • ์™ธ๋ž˜์–ด ํ•œ๊ธ€ ๋ณ€ํ™˜
  • Hanja (Python) [link]
    • ํ•œ์ž ํ•œ๊ธ€ ๋ณ€ํ™˜
  • kroman [link]
  • hangul (Perl) [link]
    • Hangul Romanization
  • textrankr (Python) [link] [demo]
    • TextRank ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๋ฌธ์„œ ์š”์•ฝ
  • ํ•œ๊ตญ์–ด Word2Vec [demo] [paper]
    • ํ•œ๊ตญ์–ด Word2Vec์˜ analogy test ๋ฐ๋ชจ
  • ๋‚˜์œ ๋‹จ์–ด ์‚ฌ์ „ [link]
    • crowdsourced dic about badword in korean

2. Dataset

  • Sejong Corpus [link]
  • KAIST Corpus [link]
  • Yonsei Univ. Corpus
  • Korea Univ. Corpus
  • Ulsan Univ. Corpus [link]
  • Wikipedia Dump [link] [Extractor]
  • NamuWiki Dump [link] [Extractor]
  • Naver News Archive [link]
  • Chosun Archive [link]
  • Naver sentiment movie corpus [link]
  • sci-news-sum-kr-50 [link]

3. Blogs / Slides / Researchers

3.1. Blogs

  • dsindex's blog [link]
  • ์—‘์‚ฌ์  , "ํ˜ผ์ž ํž˜์œผ๋กœ ํ•œ๊ตญ์–ด ์ฑ—๋ด‡ ๊ฐœ๋ฐœํ•˜๊ธฐ" [link]
  • Beomsu Kim, "word2vec ๊ด€๋ จ ์ด๋ก  ์ •๋ฆฌ" [link]
  • CPUU, "Google ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์˜คํ”ˆ์†Œ์Šค SyntaxNet ๊ณต๊ฐœ" (Korean tranlsation of Google blog) [link]
  • theeluwin, "python-crfsuite๋ฅผ ์‚ฌ์šฉํ•ด์„œ ํ•œ๊ตญ์–ด ์ž๋™ ๋„์–ด์“ฐ๊ธฐ๋ฅผ ํ•™์Šตํ•ด๋ณด์ž" [link]
  • Jaesoo Lim, "ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๋™ํ–ฅ" [link]

3.2. Slides

  • Lucy Park, "ํ•œ๊ตญ์–ด์™€ NLTK, Gensim์˜ ๋งŒ๋‚จ" (PyCon APAC 2015) [link]
  • Jeongkyu Shin, "Building AI Chat bot using Python 3 & TensorFlow" (PyCon APAC 2016) [link]
  • Changki Lee, "RNN & NLP Application" (Kangwon Univ. Machine Learning course) [link]
  • Kyunghoon Kim, "๋‰ด์Šค๋ฅผ ์žฌ๋ฏธ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•; ๋‰ด์Šค์žผ" (PyCon APAC 2016) [link]
  • Hongjoo Lee, "Python ์œผ๋กœ 19๋Œ€ ๊ตญํšŒ ๋ฝ€๊ฐœ๊ธฐ" (PyCon APAC 2016) [link]
  • Kyumin Choi,"word2vecแ„‹แ…ต แ„Žแ…ฎแ„Žแ…ฅแ†ซแ„‰แ…ตแ„‰แ…ณแ„แ…ฆแ†ทแ„‹แ…ณแ†ฏ แ„†แ…กแ†ซแ„‚แ…กแ†ปแ„‹แ…ณแ†ฏ แ„„แ…ข" (PyCon APAC 2015) [link]
  • ้€ฒ่—ค่ฃ•ไน‹ (translated by Hongbae Kim), "๋”ฅ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์˜ ์—ฐ๊ตฌ๋™ํ–ฅ" [link]
  • Hongbae Kim, "๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๊ธฐ์ˆ (I)" [link]
  • Changki Lee, "์ž์—ฐ์–ด์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๊ธฐ๊ณ„ํ•™์Šต ์†Œ๊ฐœ" [link]
  • Taeil Kim, Daeneung Son, "๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์งˆ์˜ ๊ต์ • ์‹œ์Šคํ…œ" (Naver DEVIEW 2015) [link]

4. Papers

4.1. Korean

  • ๊น€๋™์ค€, ์ด์—ฐ์ˆ˜, ์žฅ์ •์„ , ์ž„ํ•ด์ฐฝ, ๊ณ ๋ ค๋Œ€ํ•™๊ต, (์ฃผ)์—”์”จ์†Œํ”„ํŠธ, "ํ•œ๊ตญ์–ด ๋Œ€ํ™” ํ™”ํ–‰ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ์–ดํœ˜ ์ž์งˆ์˜ ์ž„๋ฒ ๋”ฉ(2015๋…„ ๋™๊ณ„ํ•™์ˆ ๋ฐœํ‘œํšŒ ๋…ผ๋ฌธ์ง‘)" [paper] link dead

4.2. English

5. Lectures

5.1. Korean Lectures

  • Kangwon Univ. ์ž์—ฐ์–ธ์–ด์ฒ˜๋ฆฌ [link]
  • ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ์Šค์ฟจ [link]
  • SNU Data Mining / Business Analytics [link]

5.2. English Lectures

  • Stanford CS224n: Natural Language Processing [link] [YouTube]
  • Stanford CS224d: Deep Learning for Natural Language Processing [link] [YouTube]
  • NLTK with Python 3 for NLP (by Sentdex) [YouTube]
  • LDA Topic Models [link]

6. Conferences / Institutes / Events

6.1. Conferences

  • ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ [link]
  • KIPS (ํ•œ๊ตญ์ •๋ณด์ฒ˜๋ฆฌํ•™ํšŒ) [link]
  • ํ•œ๊ตญ์Œ์„ฑํ•™ํšŒ ํ•™์ˆ ๋Œ€ํšŒ [link]

6.2. Institutes

  • ์–ธ์–ด๊ณตํ•™์—ฐ๊ตฌํšŒ [link]
    • ํ•œ๊ธ€ ๋ฐ ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ ํ•™์ˆ ๋Œ€ํšŒ (Since 1989, ๋งค๋…„ ๊ฐœ์ตœ) [link]
    • ๊ตญ์–ด ์ •๋ณด ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ ๊ฒฝ์ง„๋Œ€ํšŒ (Since 2010, ๋งค๋…„ ๊ฐœ์ตœ, ์ฃผ์ตœ: ๋ฌธํ™”์ฒด์œก๊ด€๊ด‘๋ถ€ ๋ฐ ๊ตญ๋ฆฝ๊ตญ์–ด์›) [link]
    • ์ž์—ฐ์–ธ์–ด์ฒ˜๋ฆฌ ํŠœํ† ๋ฆฌ์–ผ (๋น„์ •๊ธฐ์ ) [link]
    • ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๋ฐ ์ •๋ณด๊ฒ€์ƒ‰ ์›Œํฌ์ƒต [link]
  • ํ•œ๊ตญ์Œ์„ฑํ•™ํšŒ [link]

6.3. Events / Contests

  • ๊ตญ์–ด ์ •๋ณด ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ ๊ฒฝ์ง„ ๋Œ€ํšŒ [link]

7. Online Communities

  • Tensorflow KR (Facebook Group) [link]
  • AI Korea (Facebook Group) [link]
  • Bot Group (Facebook Group) [link]
  • ๋ฐ”๋ฒจํ”ผ์‰ฌ (Facebook Group) [link]
  • Reddit Machine Learning Top posts [link]

8. How to contribute

  1. Fork this Repository, by clicking on "fork" icon at the top right corner.

  2. Get the link for the forked repo, by clicking on the green button on your page. something like, "https://github.com/[username]/Awesome-Korean-NLP.git"

  3. On your local machine, "git clone https://github.com/[username]/Awesome-Korean-NLP.git"

  4. "cd Awesome-Korean-NLP"

  5. open "README.md" with your favorite text editor.

  6. Edit.

  7. git commit -a -m "added section 8: emoticons"

  8. git push, and verify on your fork

  9. goto https://github.com/datanada/Awesome-Korean-NLP and create pull request.

  10. "compare across forks" with base: datanada/Awesome.. and head: [username]/Awesome..

[beginners guide]