• Stars
    star
    809
  • Rank 55,948 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Japanese morphological analysis engine written in pure Python

Janome

https://coveralls.io/repos/github/mocobeta/janome/badge.svg?branch=master https://badges.gitter.im/org.png https://img.shields.io/conda/v/conda-forge/janome

Janome is a Japanese morphological analysis engine written in pure Python.

General documentation:

https://mocobeta.github.io/janome/en/ (English)

https://mocobeta.github.io/janome/ (Japanese)

Requirements

Python 3.7+ is required.

Install

[Note] This consumes about 500 MB memory for building.

(venv) $ pip install janome

Run

(venv) $ python
>>> from janome.tokenizer import Tokenizer
>>> t = Tokenizer()
>>> for token in t.tokenize('ใ™ใ‚‚ใ‚‚ใ‚‚ใ‚‚ใ‚‚ใ‚‚ใ‚‚ใ‚‚ใฎใ†ใก'):
...     print(token)
...
ใ™ใ‚‚ใ‚‚ ๅ่ฉž,ไธ€่ˆฌ,*,*,*,*,ใ™ใ‚‚ใ‚‚,ใ‚นใƒขใƒข,ใ‚นใƒขใƒข
ใ‚‚    ๅŠฉ่ฉž,ไฟ‚ๅŠฉ่ฉž,*,*,*,*,ใ‚‚,ใƒข,ใƒข
ใ‚‚ใ‚‚  ๅ่ฉž,ไธ€่ˆฌ,*,*,*,*,ใ‚‚ใ‚‚,ใƒขใƒข,ใƒขใƒข
ใ‚‚    ๅŠฉ่ฉž,ไฟ‚ๅŠฉ่ฉž,*,*,*,*,ใ‚‚,ใƒข,ใƒข
ใ‚‚ใ‚‚  ๅ่ฉž,ไธ€่ˆฌ,*,*,*,*,ใ‚‚ใ‚‚,ใƒขใƒข,ใƒขใƒข
ใฎ    ๅŠฉ่ฉž,้€ฃไฝ“ๅŒ–,*,*,*,*,ใฎ,ใƒŽ,ใƒŽ
ใ†ใก  ๅ่ฉž,้ž่‡ช็ซ‹,ๅ‰ฏ่ฉžๅฏ่ƒฝ,*,*,*,ใ†ใก,ใ‚ฆใƒ,ใ‚ฆใƒ

License

Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.

See LICENSE.txt and NOTICE.txt for license details.

Acknowledgement

Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.

Copyright

Copyright(C) 2015-2023, Tomoko Uchida. All rights reserved.