• Stars
    star
    283
  • Rank 146,066 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Japanese to romaji converter in Python

Open in Streamlit Current PyPI packages

cutlet

cutlet by Irasutoya

Cutlet is a tool to convert Japanese to romaji. Check out the interactive demo! Also see the docs and the original blog post.

issueを英語で書く必要はありません。

Features:

  • support for Modified Hepburn, Kunreisiki, Nihonsiki systems
  • custom overrides for individual mappings
  • custom overrides for specific words
  • built in exceptions list (Tokyo, Osaka, etc.)
  • uses foreign spelling when available in UniDic
  • proper nouns are capitalized
  • slug mode for url generation

Things not supported:

  • traditional Hepburn n-to-m: Shimbashi
  • macrons or circumflexes: Tōkyō, Tôkyô
  • passport Hepburn: Satoh (but you can use an exception)
  • hyphenating words
  • Traditional Hepburn in general is not supported

Internally, cutlet uses fugashi, so you can use the same dictionary you use for normal tokenization.

Installation

Cutlet can be installed through pip as usual.

pip install cutlet

Note that if you don't have a MeCab dictionary installed you'll also have to install one. If you're just getting started unidic-lite is a good choice.

pip install unidic-lite

Usage

A command-line script is included for quick testing. Just use cutlet and each line of stdin will be treated as a sentence. You can specify the system to use (hepburn, kunrei, nippon, or nihon) as the first argument.

$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.

In code:

import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'

# you can print a slug suitable for urls
katsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'

# You can disable using foreign spelling too
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'

# kunreisiki, nihonsiki work too
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'

# comparison
nkatu = cutlet.Cutlet('nihon')

sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'
katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'
nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'

Alternatives

  • kakasi: Historically important, but not updated since 2014.
  • pykakasi: self contained, it does segmentation on its own and uses its own dictionary.
  • kuroshiro: Javascript based.
  • kana: Go based.

More Repositories

1

fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
C++
384
star
2

posuto

🏣📮〠 Japanese postal code data.
Python
201
star
3

unidic-py

Unidic packaged for installation via pip.
Python
74
star
4

ndl-crop

Script for cropping photos from the NDL.
Python
37
star
5

unidic-lite

A small version of UniDic for easy pip installs.
Python
36
star
6

showmemore

SHOW ME MORE OF [-----]
Python
28
star
7

ipadic-py

IPAdic packaged for easy use from Python.
Python
25
star
8

awesome-digital-collections

Publicly accessible digital collections.
19
star
9

palladian-facades

🏛️ Palladian Facade Generator for ProcJam2015
LiveScript
19
star
10

multilang-filter

Script for preprocessing multilingual Markdown.
Python
14
star
11

deltos

A magic notepad. δ
LiveScript
13
star
12

gamefaces

Public domain headshots
12
star
13

dupdupdraw

Forthish drawing system with random program generation.
JavaScript
11
star
14

node-migemo

Japanese search regex generator
LiveScript
7
star
15

chargen

Random generator taking literature as input.
Python
7
star
16

ja-tokenizer-benchmark

Compare the speed of various Japanese tokenizers in Python.
Python
7
star
17

philtre

Search objects with a familiar syntax.
LiveScript
6
star
18

jp-ner

[abandoned] Work on generating an NER dataset for Japanese
Python
5
star
19

jumandic-py

JumanDic packaged for use with PyPI.
Python
3
star
20

shesha

Random generator toolkit
JavaScript
3
star
21

awesome-gamedev-jp

ゲーム開発に役立つリンク集
3
star
22

bontan.ls

Bontan is a simple scraper primarily intended for articles.
LiveScript
2
star
23

lua-mecab

Lua wrapper for Mecab Japanese morphological analyzer.
C++
2
star
24

fugashi-streamlit-demo

Streamlit demo for fugashi
Python
2
star
25

gutenjuice

Top books from Project Gutenberg, in raw form and extracted.
2
star
26

bookoff-redirect

Deal with BookOff query parameter nonsense.
HTML
2
star
27

fugashi-sagemaker-demo

A basic introduction to using fugashi for Japanese tokenization.
Jupyter Notebook
2
star
28

github-tasks.vim

Github task plugin for vim
Vim Script
2
star
29

mecab-packed

[broken/wip] Bundled mecab & unidic for installing via pip.
Shell
1
star
30

language-disruptor

Randomly replace words in Japanese sentences.
Python
1
star
31

poine-tool

POINE関連のツール
Python
1
star
32

bontan

Get embed code for a link, using OEmbed as appropriate.
Nim
1
star
33

yuzulabo.works

Yuzu Labo web site
CSS
1
star
34

mecab-manylinux1-wheel-builder

Build manylinux1 wheels with MeCab installed.
Shell
1
star
35

deltos.vim

A vim plugin for use with Deltos.
Vim Script
1
star
36

kanji

Kanji data package for Python
Python
1
star
37

visidata-conll

CoNLL-U data loader for Visidata.
Python
1
star
38

everybayes

Document classification for everyone.
Python
1
star
39

jfmt.lua

Tool for wrapping Japanese text to natural width
Lua
1
star
40

searchy

[discontinued] Simple interactive search for Node
LiveScript
1
star