Janome
Janome is a Japanese morphological analysis engine written in pure Python.
General documentation:
https://mocobeta.github.io/janome/en/ (English)
https://mocobeta.github.io/janome/ (Japanese)
Requirements
Python 3.7+ is required.
Install
[Note] This consumes about 500 MB memory for building.
(venv) $ pip install janome
Run
(venv) $ python
>>> from janome.tokenizer import Tokenizer
>>> t = Tokenizer()
>>> for token in t.tokenize('ใใใใใใใใใใฎใใก'):
... print(token)
...
ใใใ ๅ่ฉ,ไธ่ฌ,*,*,*,*,ใใใ,ในใขใข,ในใขใข
ใ ๅฉ่ฉ,ไฟๅฉ่ฉ,*,*,*,*,ใ,ใข,ใข
ใใ ๅ่ฉ,ไธ่ฌ,*,*,*,*,ใใ,ใขใข,ใขใข
ใ ๅฉ่ฉ,ไฟๅฉ่ฉ,*,*,*,*,ใ,ใข,ใข
ใใ ๅ่ฉ,ไธ่ฌ,*,*,*,*,ใใ,ใขใข,ใขใข
ใฎ ๅฉ่ฉ,้ฃไฝๅ,*,*,*,*,ใฎ,ใ,ใ
ใใก ๅ่ฉ,้่ช็ซ,ๅฏ่ฉๅฏ่ฝ,*,*,*,ใใก,ใฆใ,ใฆใ
License
Licensed under Apache License 2.0 and uses the MeCab-IPADIC dictionary/statistical model.
See LICENSE.txt and NOTICE.txt for license details.
Acknowledgement
Special thanks to @ikawaha, @takuyaa, @nakagami and @janome_oekaki.
Copyright
Copyright(C) 2015-2023, Tomoko Uchida. All rights reserved.