Awesome-Korean-NLP
A curated list of Natural Language Processing (NLP) of
- NLP of Korean Text
- NLP information written in Korean.
Feel free to contribute! or blab it here
Maintainer: Jaemin Cho
Index
- Tools
- Dataset
- Blogs / Slides / Researchers
- Papers
- Lectures
- Journals / Conferences / Institutes / Events
- Online Communities
- How to contribute
1. Tools
(Korean-specific tools are listed ahead of language-agnostic tools.)
1.1. Morpheme/ํํ์ ๋ถ์๊ธฐ + Part of Speech(PoS)/ํ์ฌ Tagger
- Hannanum (ํ๋๋) (Java, C) [link]
- KoNLPy (Python) [link]
- Kkma (๊ผฌ๊ผฌ๋ง) (Java) [link] [paper]
- KoNLPy (Python) [link]
- Komoran (Java) [link]
- KoNLPy (Python) [link]
- Mecab-ko (C++) [link]
- KoNLPy (Python) [link]
- Twitter (Scala, Java) [link]
- KoNLPy (Python) [link]
- .NET, Node.js, Python, Ruby, Elasitc Search bindings
- dparser (REST API) [link]
- UTagger [link]
- Arirang (Lucence, Java) [link]
- Rouzeta [link] [slide] [video]
- seunjeon (Scala, Java) [link]
- RHINO (๋ผ์ด๋ ธ)ย [link]
- KTS [paper]
- ๊น์ง์ [link]
1.2. Named Entity(NE) Tagger / ๊ฐ์ฒด๋ช ์ธ์๊ธฐ
- annie [link]
1.3. Spell Checker / ๋ง์ถค๋ฒ ๊ฒ์ฌ๊ธฐ
1.4. Syntax Parser / ๊ตฌ๋ฌธ ๋ถ์๊ธฐ
1.5. Sentimental Analysis / ๊ฐ์ ๋ถ์๊ธฐ
1.6. Translator / ๋ฒ์ญ๊ธฐ
1.7. Packages
- KoNLP (R) [link]
- KoNLPy (Python) [link] [paper]
- KoalaNLP (Scala) [link]
- NLTK (Python) [link] [paper]
- gensim (Python) [link]
- FastText (C) [link]
- FastText.py (Python) [link]
1.8. Others / ๊ธฐํ
- Hangulpy (Python) [link]
- ์๋ ์กฐ์ฌ/์ ๋ฏธ์ฌ ์ฒจ๋ถ, ์๋ชจ ๋ถํด ๋ฐ ๊ฒฐํฉ
- Hangulize (Python) [link]
- ์ธ๋์ด ํ๊ธ ๋ณํ
- Hanja (Python) [link]
- ํ์ ํ๊ธ ๋ณํ
- kroman [link]
- Hangul Romanization
- Ruby, Python, NodeJS, Objective-C, Swift
- hangul (Perl) [link]
- Hangul Romanization
- textrankr (Python) [link] [demo]
- TextRank ๊ธฐ๋ฐ ํ๊ตญ์ด ๋ฌธ์ ์์ฝ
- ํ๊ตญ์ด Word2Vec [demo] [paper]
- ํ๊ตญ์ด Word2Vec์ analogy test ๋ฐ๋ชจ
- ๋์ ๋จ์ด ์ฌ์ [link]
- crowdsourced dic about badword in korean
2. Dataset
- Sejong Corpus [link]
- KAIST Corpus [link]
- Yonsei Univ. Corpus
- Korea Univ. Corpus
- Ulsan Univ. Corpus [link]
- Wikipedia Dump [link] [Extractor]
- NamuWiki Dump [link] [Extractor]
- Naver News Archive [link]
- Chosun Archive [link]
- Naver sentiment movie corpus [link]
- sci-news-sum-kr-50 [link]
3. Blogs / Slides / Researchers
3.1. Blogs
- dsindex's blog [link]
- ์์ฌ์ , "ํผ์ ํ์ผ๋ก ํ๊ตญ์ด ์ฑ๋ด ๊ฐ๋ฐํ๊ธฐ" [link]
- Beomsu Kim, "word2vec ๊ด๋ จ ์ด๋ก ์ ๋ฆฌ" [link]
- CPUU, "Google ์์ฐ์ด ์ฒ๋ฆฌ ์คํ์์ค SyntaxNet ๊ณต๊ฐ" (Korean tranlsation of Google blog) [link]
- theeluwin, "python-crfsuite๋ฅผ ์ฌ์ฉํด์ ํ๊ตญ์ด ์๋ ๋์ด์ฐ๊ธฐ๋ฅผ ํ์ตํด๋ณด์" [link]
- Jaesoo Lim, "ํ๊ตญ์ด ํํ์ ๋ถ์๊ธฐ ๋ํฅ" [link]
3.2. Slides
- Lucy Park, "ํ๊ตญ์ด์ NLTK, Gensim์ ๋ง๋จ" (PyCon APAC 2015) [link]
- Jeongkyu Shin, "Building AI Chat bot using Python 3 & TensorFlow" (PyCon APAC 2016) [link]
- Changki Lee, "RNN & NLP Application" (Kangwon Univ. Machine Learning course) [link]
- Kyunghoon Kim, "๋ด์ค๋ฅผ ์ฌ๋ฏธ์๊ฒ ๋ง๋๋ ๋ฐฉ๋ฒ; ๋ด์ค์ผ" (PyCon APAC 2016) [link]
- Hongjoo Lee, "Python ์ผ๋ก 19๋ ๊ตญํ ๋ฝ๊ฐ๊ธฐ" (PyCon APAC 2016) [link]
- Kyumin Choi,"word2vecแแ ต แแ ฎแแ ฅแซแแ ตแแ ณแแ ฆแทแแ ณแฏ แแ กแซแแ กแปแแ ณแฏ แแ ข" (PyCon APAC 2015) [link]
- ้ฒ่ค่ฃไน (translated by Hongbae Kim), "๋ฅ๋ฌ๋์ ์ด์ฉํ ์์ฐ์ด์ฒ๋ฆฌ์ ์ฐ๊ตฌ๋ํฅ" [link]
- Hongbae Kim, "๋จธ์ ๋ฌ๋์ ์์ฐ์ด ์ฒ๋ฆฌ๊ธฐ์ (I)" [link]
- Changki Lee, "์์ฐ์ด์ฒ๋ฆฌ๋ฅผ ์ํ ๊ธฐ๊ณํ์ต ์๊ฐ" [link]
- Taeil Kim, Daeneung Son, "๊ธฐ๊ณ ๋ฒ์ญ ๋ชจ๋ธ ๊ธฐ๋ฐ ์ง์ ๊ต์ ์์คํ " (Naver DEVIEW 2015) [link]
4. Papers
4.1. Korean
๊น๋์ค, ์ด์ฐ์, ์ฅ์ ์ , ์ํด์ฐฝ, ๊ณ ๋ ค๋ํ๊ต, (์ฃผ)์์จ์ํํธ, "ํ๊ตญ์ด ๋ํ ํํ ๋ถ๋ฅ๋ฅผ ์ํ ์ดํ ์์ง์ ์๋ฒ ๋ฉ(2015๋ ๋๊ณํ์ ๋ฐํํ ๋ ผ๋ฌธ์ง)" [paper]link dead
4.2. English
5. Lectures
5.1. Korean Lectures
- Kangwon Univ. ์์ฐ์ธ์ด์ฒ๋ฆฌ [link]
- ๋ฐ์ดํฐ ์ฌ์ด์ธ์ค ์ค์ฟจ [link]
- SNU Data Mining / Business Analytics [link]
5.2. English Lectures
- Stanford CS224n: Natural Language Processing [link] [YouTube]
- Stanford CS224d: Deep Learning for Natural Language Processing [link] [YouTube]
- NLTK with Python 3 for NLP (by Sentdex) [YouTube]
- LDA Topic Models [link]
6. Conferences / Institutes / Events
6.1. Conferences
- ํ๊ธ ๋ฐ ํ๊ตญ์ด ์ ๋ณด์ฒ๋ฆฌ ํ์ ๋ํ [link]
- KIPS (ํ๊ตญ์ ๋ณด์ฒ๋ฆฌํํ) [link]
- ํ๊ตญ์์ฑํํ ํ์ ๋ํ [link]
6.2. Institutes
- ์ธ์ด๊ณตํ์ฐ๊ตฌํ [link]
- ํ๊ธ ๋ฐ ํ๊ตญ์ด ์ ๋ณด์ฒ๋ฆฌ ํ์ ๋ํ (Since 1989, ๋งค๋ ๊ฐ์ต) [link]
- ๊ตญ์ด ์ ๋ณด ์ฒ๋ฆฌ ์์คํ ๊ฒฝ์ง๋ํ (Since 2010, ๋งค๋ ๊ฐ์ต, ์ฃผ์ต: ๋ฌธํ์ฒด์ก๊ด๊ด๋ถ ๋ฐ ๊ตญ๋ฆฝ๊ตญ์ด์) [link]
- ์์ฐ์ธ์ด์ฒ๋ฆฌ ํํ ๋ฆฌ์ผ (๋น์ ๊ธฐ์ ) [link]
- ์์ฐ์ด์ฒ๋ฆฌ ๋ฐ ์ ๋ณด๊ฒ์ ์ํฌ์ต [link]
- ํ๊ตญ์์ฑํํ [link]
6.3. Events / Contests
- ๊ตญ์ด ์ ๋ณด ์ฒ๋ฆฌ ์์คํ ๊ฒฝ์ง ๋ํ [link]
7. Online Communities
- Tensorflow KR (Facebook Group) [link]
- AI Korea (Facebook Group) [link]
- Bot Group (Facebook Group) [link]
- ๋ฐ๋ฒจํผ์ฌ (Facebook Group) [link]
- Reddit Machine Learning Top posts [link]
8. How to contribute
-
Fork this Repository, by clicking on "fork" icon at the top right corner.
-
Get the link for the forked repo, by clicking on the green button on your page. something like, "https://github.com/[username]/Awesome-Korean-NLP.git"
-
On your local machine, "git clone https://github.com/[username]/Awesome-Korean-NLP.git"
-
"cd Awesome-Korean-NLP"
-
open "README.md" with your favorite text editor.
-
Edit.
-
git commit -a -m "added section 8: emoticons"
-
git push, and verify on your fork
-
goto https://github.com/datanada/Awesome-Korean-NLP and create pull request.
-
"compare across forks" with base: datanada/Awesome.. and head: [username]/Awesome..