• Stars
    star
    116
  • Rank 302,279 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created over 7 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Korean grapheme-to-phone conversion in Python

KoG2P

Given an input of a series of Korean graphemes/letters (i.e. Hangul), KoG2P outputs the corresponding pronunciations.

ν•œκ΅­μ–΄μ˜ λ¬Έμžμ—΄λ‘œλΆ€ν„° λ°œμŒμ—΄μ„ μƒμ„±ν•˜λŠ” 파이썬 기반 G2P νŒ¨ν‚€μ§€μž…λ‹ˆλ‹€.
ν„°λ―Έλ„μ—μ„œ μ›ν•˜λŠ” λ¬Έμžμ—΄μ„ ν•¨κ»˜ μž…λ ₯ν•΄ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

How to use?

On terminal, you simply can type in your input within quotations:

$ python g2p.py 'λ°•λ¬Όκ΄€'

Then you'll get /λ°©λ¬Όκ΄€/ symbolized as follows:

p0 aa ng mm uu ll k0 wa nf

NB. Your input does not necessarily need to be a lemma or a legitimate sequence of Korean; the system will provide an output based on the phonological rules of Korean for any sequences in Hangul.

Requirement

  • Python 2.7 or 3.x

Symbol table

Please check out the symbol table below for the mapping.

C/V Position Symbols in Hangul Symbols in KoG2P
consonant onset γ…‚ p0
consonant onset ㅍ ph
consonant onset γ…ƒ pp
consonant onset γ„· t0
consonant onset γ…Œ th
consonant onset γ„Έ tt
consonant onset γ„± k0
consonant onset γ…‹ kh
consonant onset γ„² kk
consonant onset γ…… s0
consonant onset γ…† ss
consonant onset γ…Ž h0
consonant onset γ…ˆ c0
consonant onset γ…Š ch
consonant onset γ…‰ cc
consonant onset ㅁ mm
consonant onset γ„΄ nn
consonant onset γ„Ή rr
consonant coda γ…‚ pf
consonant coda ㅍ ph
consonant coda γ„· tf
consonant coda γ…Œ th
consonant coda γ„± kf
consonant coda γ…‹ kh
consonant coda γ„² kk
consonant coda γ…… s0
consonant coda γ…† ss
consonant coda γ…Ž h0
consonant coda γ…ˆ c0
consonant coda γ…Š ch
consonant coda ㅁ mf
consonant coda γ„΄ nf
consonant coda γ…‡ ng
consonant coda γ„Ή ll
consonant coda γ„±γ…… ks
consonant coda γ„΄γ…ˆ nc
consonant coda γ„΄γ…Ž nh
consonant coda γ„Ήγ„± lk
consonant coda ㄹㅁ lm
consonant coda γ„Ήγ…‚ lb
consonant coda γ„Ήγ…… ls
consonant coda γ„Ήγ…Œ lt
consonant coda ㄹㅍ lp
consonant coda γ„Ήγ…Ž lh
consonant coda γ…‚γ…… ps
vowel monophthong γ…£ ii
vowel monophthong γ…” ee
vowel monophthong ㅐ qq
vowel monophthong ㅏ aa
vowel monophthong γ…‘ xx
vowel monophthong γ…“ vv
vowel monophthong γ…œ uu
vowel monophthong γ…— oo
vowel diphthong γ…– ye
vowel diphthong γ…’ yq
vowel diphthong γ…‘ ya
vowel diphthong γ…• yv
vowel diphthong γ…  yu
vowel diphthong γ…› yo
vowel diphthong γ…Ÿ wi
vowel diphthong γ…š wo
vowel diphthong γ…™ wq
vowel diphthong γ…ž we
vowel diphthong γ…˜ wa
vowel diphthong ㅝ wv
vowel diphthong γ…’ xi

NB. IPA symbols for Korean phones can be found in the following page: IPA for Korean.

Reference

Please cite the following if using this code:

@misc{cho2017kog2p,
  title = {Korean Grapheme-to-Phoneme Analyzer (KoG2P)},
  author = {Yejin Cho},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/scarletcho/KoG2P}}
}

Thank you for your citations!

  • Yoon Seok Hong, Kyung Seo Ki, and Gahgene Gweon. 2018. Automatic Miscue Detection Using RNN Based Models with Data Augmentation. In Proc. Interspeech 2018. 1646-1650. [pdf]

  • Younggun Lee and Taesu Kim. 2018. Learning pronunciation from a foreign language in speech synthesis network. arXiv preprint. arXiv:1811.09364. [pdf]