• Stars
    star
    552
  • Rank 80,595 (Top 2 %)
  • Language
    Python
  • Created almost 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

拼音转汉字, 拼音输入法引擎, pin yin -> 拼音

Pinyin2Hanzi

拼音转汉字,可以作为拼音输入法的转换引擎,兼容Python 2、Python 3。

安装

Python 2:

$ python setup.py install --user

Python 3:

$ python3 setup.py install --user

使用

下面的示例在Python 3中运行。

基于HMM的转换

原理是viterbi算法。

from Pinyin2Hanzi import DefaultHmmParams
from Pinyin2Hanzi import viterbi

hmmparams = DefaultHmmParams()

## 2个候选
result = viterbi(hmm_params=hmmparams, observations=('ni', 'zhi', 'bu', 'zhi', 'dao'), path_num = 2)
for item in result:
    print(item.score, item.path)
'''输出
1.3155294593897203e-08 ['你', '知', '不', '知', '道']
3.6677865125992192e-09 ['你', '只', '不', '知', '道']
'''

## 2个候选,使用对数打分
result = viterbi(hmm_params=hmmparams, observations=('ni', 'zhi', 'bu', 'zhi', 'dao'), path_num = 2, log = True)
for item in result:
    print(item.score, item.path)
'''输出
-18.14644152864202 ['你', '知', '不', '知', '道']
-19.423677486918002 ['你', '只', '不', '知', '道']
'''

## 2个候选,使用对数打分
result = viterbi(hmm_params=hmmparams, observations=('ni', 'zhii', 'bu', 'zhi', 'dao'), path_num = 2, log = True)
for item in result:
    print(item.score, item.path)
# 发生KeyError,`zhii`不规范

基于DAG的转换

原理是词库+动态规划。

from Pinyin2Hanzi import DefaultDagParams
from Pinyin2Hanzi import dag

dagparams = DefaultDagParams()

## 2个候选
result = dag(dagparams, ('ni', 'bu', 'zhi', 'dao', 'de', 'shi'), path_num=2)
for item in result:
    print(item.score, item.path)
''' 输出
0.08117536840088911 ['你不知道', '的是']
0.04149191639287887 ['你不知道', '的诗']
'''

## 2个候选,使用对数打分
result = dag(dagparams, ('ni', 'bu', 'zhi', 'dao', 'de', 'shi'), path_num=2, log=True)
for item in result:
    print(item.score, item.path)
''' 输出
-2.5111434226494866 ['你不知道', '的是']
-3.1822566564324477 ['你不知道', '的诗']
'''

## 1个候选
print( dag(dagparams, ['ti', 'chu', 'le', 'bu', 'cuo', 'de', 'jie', 'jve', 'fang', 'an'], path_num=1) )
'''输出
[< score=0.0017174549839096384, path=['提出了', '不错', '的', '解决方案'] >]
'''

## 2个候选,使用对数打分
result = dag(dagparams, ('ni', 'bu', 'zhi', 'dao', 'de', 'shii'), path_num=2, log=True)
print(result)
# 输出空列表,因为`shii`不存在

自定义params

实现AbstractHmmParams, AbstractDagParams这两个接口即可。具体可以参考源码。

关于拼音

给出的拼音必须是“规范”的。例如

  • 略 -> lve
  • 据 -> ju

列举所有“规范”的拼音:

from Pinyin2Hanzi import all_pinyin
for py in all_pinyin():
        print(py)

将拼音转换为“规范”的拼音:

from Pinyin2Hanzi import simplify_pinyin

print(simplify_pinyin('lue'))
# 输出:'lve'

print(simplify_pinyin('lüè'))
# 输出:'lve'

判断是否是“规范”的拼音:

from Pinyin2Hanzi import is_pinyin

print(is_pinyin('lue'))
# 输出:False

print(is_pinyin('lüè'))
# 输出:False

print(is_pinyin('lvee'))
# 输出:False

print(is_pinyin('lve'))
# 输出:True

训练

原始数据和训练代码在train目录下。数据来自jpinyinpinyin搜狗语料库-互联网词库等。处理数据时用到了汉字转拼音 工具ChineseTone

原理

如何实现拼音与汉字的互相转换

License

MIT

More Repositories

1

TextRank4ZH

🌳从中文文本中自动提取关键词和摘要
Python
3,067
star
2

huno

A responsible theme for Hexo
CSS
495
star
3

another-tutorial-about-java-web

😿 another tutorial about java web
340
star
4

tencent-open-source

腾讯开源作品整理
Python
274
star
5

how-to-load-dynamic-script

The right way to load javascript files dynamically.
HTML
217
star
6

ChineseTone

[本项目不再维护] 将汉字转换为拼音, 支持多音字,拼音 -> pin yin
Hack
204
star
7

kmedoids

[Unmaintained] The Python implementation of k-medoids.
Python
119
star
8

flask-tutorial

Python Flask Web 框架入门教程
50
star
9

awesome-toc

generate awesome toc for web page
HTML
44
star
10

jianshu-site-search

简书站内搜索
Python
24
star
11

mybatis-tutorial

mybatis 入门教程
Java
16
star
12

Flask-dashboard-for-UPYUN

基于Python Flask框架的又拍云管理工具
JavaScript
16
star
13

paste-as-markdown

Paste HTML as Markdown
JavaScript
5
star
14

TPP

[Deprecated] A PHP framework
PHP
3
star
15

mini-pinyin

Get tone(pinyin) of Chinese character.
JavaScript
2
star
16

n-source

源码注释 https://github.com/tj/n
Shell
2
star
17

rq-source

源码注释 https://github.com/nvie/rq
Python
2
star
18

random-password-cli

Generate random password in cli.
JavaScript
2
star
19

center-text

Center the output in terminal.
Shell
1
star
20

terminal-text-width

Get the number of columns occupied by specified text in terminal.
JavaScript
1
star
21

pocha-demos

pocha is python's mocha.
Python
1
star
22

lyric-player

Play lyric in terminal.
JavaScript
1
star
23

p

PS1 management
Python
1
star
24

Timestamp-Workflow

Alfred Workflow 时间戳转换
Python
1
star
25

wxapp-find-pinyin

微信小程序:查拼音
JavaScript
1
star