• Stars
    star
    794
  • Rank 57,349 (Top 2 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created almost 9 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.

kuroshiro

kuroshiro

Build Status Coverage Status npm version Join the chat at https://gitter.im/hexenq/kuroshiro License

kuroshiro is a Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.

Read this in other languages: English, ๆ—ฅๆœฌ่ชž, ็ฎ€ไฝ“ไธญๆ–‡, ็น้ซ”ไธญๆ–‡, Esperanto.

Demo

You can check the demo here.

Feature

  • Japanese Sentence => Hiragana, Katakana or Romaji
  • Furigana and okurigana supported
  • ๐Ÿ†•Multiple morphological analyzers supported
  • ๐Ÿ†•Multiple romanization systems supported
  • Useful Japanese utils

Breaking Change in 1.x

  • Seperate morphological analyzer from phonetic notation logic to make it possible that we can use different morphological analyzers (ready-made or customized)
  • Embrace ES8/ES2017 to use async/await functions
  • Use ES6 Module instead of CommonJS

Ready-made Analyzer Plugins

You should check the environment compatibility of each analyzer before you start working with them

Analyzer Node.js Support Browser Support Plugin Repo Developer
Kuromoji โœ“ โœ“ kuroshiro-analyzer-kuromoji Hexen Qi
Mecab โœ“ โœ— kuroshiro-analyzer-mecab Hexen Qi
Yahoo Web API โœ“ โœ— kuroshiro-analyzer-yahoo-webapi Hexen Qi

Usage

Node.js (or using a module bundler (e.g. Webpack))

Install with npm package manager:

$ npm install kuroshiro

Load the library:

Support ES6 Module import

import Kuroshiro from "kuroshiro";
// Initialize kuroshiro with an instance of analyzer (You could check the [apidoc](#initanalyzer) for more information):
// For this example, you should npm install and import the kuromoji analyzer first
import KuromojiAnalyzer from "kuroshiro-analyzer-kuromoji";
// Instantiate
const kuroshiro = new Kuroshiro();
// Initialize
// Here uses async/await, you could also use Promise
await kuroshiro.init(new KuromojiAnalyzer());
// Convert what you want
const result = await kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", { to: "hiragana" });

And CommonJS require

const Kuroshiro = require("kuroshiro")๏ผ›
const KuromojiAnalyzer = require("kuroshiro-analyzer-kuromoji");
const kuroshiro = new Kuroshiro();

kuroshiro.init(new KuromojiAnalyzer())
    .then(function(){
        return kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", { to: "hiragana" });
    })
    .then(function(result){
        console.log(result);
    })

Browser

Add dist/kuroshiro.min.js to your frontend project (you may first build it from source with npm run build after npm install), and in your HTML:

<script src="url/to/kuroshiro.min.js"></script>

For this example, you should also include kuroshiro-analyzer-kuromoji.min.js which you could get from kuroshiro-analyzer-kuromoji

<script src="url/to/kuroshiro-analyzer-kuromoji.min.js"></script>

Instantiate:

var kuroshiro = new Kuroshiro();

Initialize kuroshiro with an instance of analyzer, then convert what you want:

kuroshiro.init(new KuromojiAnalyzer({ dictPath: "url/to/dictFiles" }))
    .then(function () {
        return kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", { to: "hiragana" });
    })
    .then(function(result){
        console.log(result);
    })

API

Constructor

Examples

const kuroshiro = new Kuroshiro();

Instance Medthods

init(analyzer)

Initialize kuroshiro with an instance of analyzer. You should first import an analyzer and initialize it. You can make use of the Ready-made Analyzers listed above. And please refer to documentation of analyzers for analyzer initialization instructions

Arguments

  • analyzer - An instance of analyzer.

Examples

await kuroshiro.init(new KuromojiAnalyzer());

convert(str, [options])

Convert given string to target syllabary with options available

Arguments

  • str - A String to be converted.
  • options - Optional kuroshiro has several convert options as below.
Options Type Default Description
to String "hiragana" Target syllabary [hiragana, katakana, romaji]
mode String "normal" Convert mode [normal, spaced, okurigana, furigana]
romajiSystem* String "hepburn" Romanization system [nippon, passport, hepburn]
delimiter_start String "(" Delimiter(Start)
delimiter_end String ")" Delimiter(End)

*: Param romajiSystem is only applied when the value of param to is romaji. For more about it, check Romanization System

Examples

// normal
await kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", {mode:"okurigana", to:"hiragana"});
// result๏ผšใ‹ใ‚“ใ˜ใจใ‚ŒใŸใ‚‰ใฆใ‚’ใคใชใ”ใ†ใ€ใ‹ใ•ใชใ‚‹ใฎใฏใ˜ใ‚“ใ›ใ„ใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขใ•ใ„ใ“ใ†๏ผ
// spaced
await kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", {mode:"okurigana", to:"hiragana"});
// result๏ผšใ‹ใ‚“ใ˜ใจใ‚Œ ใŸใ‚‰ ใฆ ใ‚’ ใคใชใ” ใ† ใ€ ใ‹ใ•ใชใ‚‹ ใฎ ใฏ ใ˜ใ‚“ใ›ใ„ ใฎ ใƒฉใ‚คใƒณ   and   ใƒฌใƒŸ ใƒชใ‚ข ใ•ใ„ใ“ใ† ๏ผ
// okurigana
await kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", {mode:"okurigana", to:"hiragana"});
// result: ๆ„Ÿ(ใ‹ใ‚“)ใ˜ๅ–(ใจ)ใ‚ŒใŸใ‚‰ๆ‰‹(ใฆ)ใ‚’็น‹(ใคใช)ใ”ใ†ใ€้‡(ใ‹ใ•)ใชใ‚‹ใฎใฏไบบ็”Ÿ(ใ˜ใ‚“ใ›ใ„)ใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜(ใ•ใ„ใ“ใ†)๏ผ
// furigana
await kuroshiro.convert("ๆ„Ÿใ˜ๅ–ใ‚ŒใŸใ‚‰ๆ‰‹ใ‚’็น‹ใ”ใ†ใ€้‡ใชใ‚‹ใฎใฏไบบ็”Ÿใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜๏ผ", {mode:"furigana", to:"hiragana"});
// result: ๆ„Ÿ(ใ‹ใ‚“)ใ˜ๅ–(ใจ)ใ‚ŒใŸใ‚‰ๆ‰‹(ใฆ)ใ‚’็น‹(ใคใช)ใ”ใ†ใ€้‡(ใ‹ใ•)ใชใ‚‹ใฎใฏไบบ็”Ÿ(ใ˜ใ‚“ใ›ใ„)ใฎใƒฉใ‚คใƒณ and ใƒฌใƒŸใƒชใ‚ขๆœ€้ซ˜(ใ•ใ„ใ“ใ†)๏ผ

Utils

Examples

const result = Kuroshiro.Util.isHiragana("ใ‚"));

isHiragana(char)

Check if input char is hiragana.

isKatakana(char)

Check if input char is katakana.

isKana(char)

Check if input char is kana.

isKanji(char)

Check if input char is kanji.

isJapanese(char)

Check if input char is Japanese.

hasHiragana(str)

Check if input string has hiragana.

hasKatakana(str)

Check if input string has katakana.

hasKana(str)

Check if input string has kana.

hasKanji(str)

Check if input string has kanji.

hasJapanese(str)

Check if input string has Japanese.

kanaToHiragna(str)

Convert input kana string to hiragana.

kanaToKatakana(str)

Convert input kana string to katakana.

kanaToRomaji(str, system)

Convert input kana string to romaji. Param system accepts "nippon", "passport", "hepburn" (Default: "hepburn").

Romanization System

kuroshiro supports three kinds of romanization systems.

nippon: Nippon-shiki romanization. Refer to ISO 3602 Strict.

passport: Passport-shiki romanization. Refer to Japanese romanization table published by Ministry of Foreign Affairs of Japan.

hepburn: Hepburn romanization. Refer to BS 4812 : 1972.

There is a useful webpage for you to check the difference between these romanization systems.

Notice for Romaji Conversion

Since it's impossible to fully automatically convert furigana directly to romaji because furigana lacks information on pronunciation (Refer to ใชใœ ใƒ•ใƒชใ‚ฌใƒŠใงใฏ ใƒ€ใƒกใชใฎใ‹๏ผŸ).

kuroshiro will not handle chลon when processing directly furigana (kana) -> romaji conversion with every romanization system (Except that Chลonpu will be handled)

For example, you'll get "kousi", "koushi", "koushi" respectively when converts kana "ใ“ใ†ใ—" to romaji using nippon, passport, hepburn romanization system.

The kanji -> romaji conversion with/without furigana mode is unaffected by this logic.

Contributing

Please check CONTRIBUTING.

Inspired By

  • kuromoji
  • wanakana

License

MIT