KoG2P
Given an input of a series of Korean graphemes/letters (i.e. Hangul), KoG2P outputs the corresponding pronunciations.
νκ΅μ΄μ λ¬Έμμ΄λ‘λΆν° λ°μμ΄μ μμ±νλ νμ΄μ¬ κΈ°λ° G2P ν¨ν€μ§μ
λλ€.
ν°λ―Έλμμ μνλ λ¬Έμμ΄μ ν¨κ» μ
λ ₯ν΄ μ¬μ©ν μ μμ΅λλ€.
How to use?
On terminal, you simply can type in your input within quotations:
$ python g2p.py 'λ°λ¬Όκ΄'
Then you'll get /λ°©λ¬Όκ΄/ symbolized as follows:
p0 aa ng mm uu ll k0 wa nf
NB. Your input does not necessarily need to be a lemma or a legitimate sequence of Korean; the system will provide an output based on the phonological rules of Korean for any sequences in Hangul.
Requirement
- Python 2.7 or 3.x
Symbol table
Please check out the symbol table below for the mapping.
C/V | Position | Symbols in Hangul | Symbols in KoG2P |
---|---|---|---|
consonant | onset | γ | p0 |
consonant | onset | γ | ph |
consonant | onset | γ | pp |
consonant | onset | γ· | t0 |
consonant | onset | γ | th |
consonant | onset | γΈ | tt |
consonant | onset | γ± | k0 |
consonant | onset | γ | kh |
consonant | onset | γ² | kk |
consonant | onset | γ | s0 |
consonant | onset | γ | ss |
consonant | onset | γ | h0 |
consonant | onset | γ | c0 |
consonant | onset | γ | ch |
consonant | onset | γ | cc |
consonant | onset | γ | mm |
consonant | onset | γ΄ | nn |
consonant | onset | γΉ | rr |
consonant | coda | γ | pf |
consonant | coda | γ | ph |
consonant | coda | γ· | tf |
consonant | coda | γ | th |
consonant | coda | γ± | kf |
consonant | coda | γ | kh |
consonant | coda | γ² | kk |
consonant | coda | γ | s0 |
consonant | coda | γ | ss |
consonant | coda | γ | h0 |
consonant | coda | γ | c0 |
consonant | coda | γ | ch |
consonant | coda | γ | mf |
consonant | coda | γ΄ | nf |
consonant | coda | γ | ng |
consonant | coda | γΉ | ll |
consonant | coda | γ±γ | ks |
consonant | coda | γ΄γ | nc |
consonant | coda | γ΄γ | nh |
consonant | coda | γΉγ± | lk |
consonant | coda | γΉγ | lm |
consonant | coda | γΉγ | lb |
consonant | coda | γΉγ | ls |
consonant | coda | γΉγ | lt |
consonant | coda | γΉγ | lp |
consonant | coda | γΉγ | lh |
consonant | coda | γ γ | ps |
vowel | monophthong | γ £ | ii |
vowel | monophthong | γ | ee |
vowel | monophthong | γ | |
vowel | monophthong | γ | aa |
vowel | monophthong | γ ‘ | xx |
vowel | monophthong | γ | vv |
vowel | monophthong | γ | uu |
vowel | monophthong | γ | oo |
vowel | diphthong | γ | ye |
vowel | diphthong | γ | yq |
vowel | diphthong | γ | ya |
vowel | diphthong | γ | yv |
vowel | diphthong | γ | yu |
vowel | diphthong | γ | yo |
vowel | diphthong | γ | wi |
vowel | diphthong | γ | wo |
vowel | diphthong | γ | wq |
vowel | diphthong | γ | we |
vowel | diphthong | γ | wa |
vowel | diphthong | γ | wv |
vowel | diphthong | γ ’ | xi |
NB. IPA symbols for Korean phones can be found in the following page: IPA for Korean.
Reference
Please cite the following if using this code:
@misc{cho2017kog2p,
title = {Korean Grapheme-to-Phoneme Analyzer (KoG2P)},
author = {Yejin Cho},
year = {2017},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/scarletcho/KoG2P}}
}
Thank you for your citations!
-
Yoon Seok Hong, Kyung Seo Ki, and Gahgene Gweon. 2018. Automatic Miscue Detection Using RNN Based Models with Data Augmentation. In Proc. Interspeech 2018. 1646-1650. [pdf]
-
Younggun Lee and Taesu Kim. 2018. Learning pronunciation from a foreign language in speech synthesis network. arXiv preprint. arXiv:1811.09364. [pdf]