• Stars
    star
    289
  • Rank 142,568 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

jaconv

travis-ci.org coveralls.io pyversion latest version license download

jaconv (Japanese Converter) is interconverter for Hiragana, Katakana, Hankaku (half-width character) and Zenkaku (full-width character)

Japanese README is available.

INSTALLATION

$ pip install jaconv

USAGE

See also document

import jaconv

# Hiragana to Katakana
jaconv.hira2kata('ともえまみ')
# => 'トモエマミ'

# Hiragana to half-width Katakana
jaconv.hira2hkata('ともえまみ')
# => 'トモエマミ'

# Katakana to Hiragana
jaconv.kata2hira('巴マミ')
# => '巴まみ'

# half-width character to full-width character
# default parameters are followings: kana=True, ascii=False, digit=False
jaconv.h2z('ティロ・フィナーレ')
# => 'ティロ・フィナーレ'

# half-width character to full-width character
# but only ascii characters
jaconv.h2z('abc', kana=False, ascii=True, digit=False)
# => 'abc'

# half-width character to full-width character
# but only digit characters
jaconv.h2z('123', kana=False, ascii=False, digit=True)
# => '123'

# half-width character to full-width character
# except half-width Katakana
jaconv.h2z('アabc123', kana=False, digit=True, ascii=True)
# => 'アabc123'

# an alias of h2z
jaconv.hankaku2zenkaku('ティロ・フィナーレabc123')
# => 'ティロ・フィナーレabc123'

# full-width character to half-width character
# default parameters are followings: kana=True, ascii=False, digit=False
jaconv.z2h('ティロ・フィナーレ')
# => 'ティロ・フィナーレ'

# full-width character to half-width character
# but only ascii characters
jaconv.z2h('abc', kana=False, ascii=True, digit=False)
# => 'abc'

# full-width character to half-width character
# but only digit characters
jaconv.z2h('123', kana=False, ascii=False, digit=True)
# => '123'

# full-width character to half-width character
# except full-width Katakana
jaconv.z2h('アabc123', kana=False, digit=True, ascii=True)
# => 'アabc123'

# an alias of z2h
jaconv.zenkaku2hankaku('ティロ・フィナーレabc123')
# => 'ティロ・フィナーレabc123'

# normalize
jaconv.normalize('ティロ・フィナ〜レ', 'NFKC')
# => 'ティロ・フィナーレ'

# Hiragana to alphabet
jaconv.kana2alphabet('じゃぱん')
# => 'japan'

# Alphabet to Hiragana
jaconv.alphabet2kana('japan')
# => 'じゃぱん'

# Katakana to Alphabet
jaconv.kata2alphabet('ケツイ')
# => 'ketsui'

# Alphabet to Katakana
jaconv.alphabet2kata('namba')
# => 'ナンバ'

# Hiragana to Julius's phoneme format
jaconv.hiragana2julius('てんきすごくいいいいいい')
# => 't e N k i s u g o k u i:'

NOTE

jaconv.normalize method expand unicodedata.normalize for Japanese language processing.

'〜' => 'ー'
'~' => 'ー'
"’" => "'"
'”'=> '"'
'“' => '``'
'―' => '-'
'‐' => '-'
'˗' => '-'
'֊' => '-'
'‐' => '-'
'‑' => '-'
'‒' => '-'
'–' => '-'
'⁃' => '-'
'⁻' => '-'
'₋' => '-'
'−' => '-'
'﹣' => 'ー'
'-' => 'ー'
'—' => 'ー'
'―' => 'ー'
'━' => 'ー'
'─' => 'ー'

More Repositories

1

neologdn

Japanese text normalizer for mecab-neologd
Cython
265
star
2

dataset-list

lists of text corpus and more (mainly Japanese)
116
star
3

pymlask

Emotion analyzer for Japanese text
Python
111
star
4

oseti

Dictionary based Sentiment Analysis for Japanese
Python
90
star
5

misc

Machine Learning / Randomized Algorithm and more
Jupyter Notebook
35
star
6

mozcpy

Mozc for Python: Kana-Kanji converter
Python
34
star
7

flati

Flatten nested iterable object for Python (Pure-Python implementation)
Python
28
star
8

madoka-python

Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)
C++
25
star
9

oll-python

Online machine learning algorithms (based on OLL C++ library)
C++
22
star
10

shellinford-python

Wavelet Matrix/Tree succinct data structure for full text search (based on shellinford C++ library)
C++
22
star
11

rakutenma-python

Rakuten MA (Python version)
Python
21
star
12

sengiri

Yet another sentence-level tokenizer for the Japanese text
Python
21
star
13

python-tr

A Pure-Python implementation of the tr algorithm
Python
14
star
14

asa-python

Japanese Argument Structure Analyzer (ASA) client for Python
Python
11
star
15

mecab-as-kkc

Converting Mozc dictionary to MeCab dictionary for Kana-Kanji conversion (KKC)
Python
10
star
16

coding-tips

ど忘れしたときのためのメモ
10
star
17

zunda-python

Zunda: Japanese Enhanced Modality Analyzer client for Python.
Python
10
star
18

jctconv

Rename jctconv -> jaconv. Please use the jaconv
Python
8
star
19

pytypo

English spelling correction
Python
7
star
20

morris_counter

Memory-efficient probabilistic counter namely Morris Counter
Python
5
star
21

udon

Rename udon -> pytypo. Please use the pytypo
Python
4
star
22

neologdn-java

Japanese text normalizer for mecab-neologd
Java
4
star
23

dotfiles

Shell
3
star
24

csj-eval

For evaluating speech recognition system using the Corpus of Spontaneous Japanese (CSJ)
Python
3
star
25

kpy

Keitai (Japanese mobile phone) model name extractor on Python
Python
2
star
26

neologd-diff

Write diff (added/removed entries) of mecab-ipadic-neologd between 2 versions
Python
2
star
27

ikegami-yukino.github.io

Profile de Yukino Ikegami
HTML
1
star
28

yascikit-learn

Yet another scikit-learn
Python
1
star
29

mecab-python-windows

C++
1
star
30

notebooks

Jupyter notebook
Jupyter Notebook
1
star