• Stars
    star
    152
  • Rank 244,685 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.

rnnmorph

Current version on PyPI Python versions Tests Status Code Climate

Important: please see https://github.com/natasha/slovnet#morphology-1

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

pip install rnnmorph

Usage

Example: Open In Colab

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training: Open In Colab

Acknowledgements

More Repositories

1

rulm

Language modeling and instruction tuning for Russian
Jupyter Notebook
446
star
2

rupo

Библиотека для анализа и генерации стихов на русском языке
Python
177
star
3

summarus

Models for automatic abstractive summarization
Python
170
star
4

tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel
HTML
94
star
5

ping_pong_bench

Python
57
star
6

UNMT

Code inspired by Unsupervised Machine Translation Using Monolingual Corpora Only
Jupyter Notebook
50
star
7

PoetryCorpus

Поэтический корпус русского языка
Python
41
star
8

saiga_bot

Telegram bot for different language models. Supports system prompts and images
Python
35
star
9

gazeta

Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке
Python
30
star
10

saiga

Python
26
star
11

HeadlineCause

A dataset of news headlines for detecting causalities
Jupyter Notebook
11
star
12

russ

Package for word stress detection
Python
10
star
13

rudetox

Python
7
star
14

purano

News annotation and clustering
Jupyter Notebook
7
star
15

nghack

Решение НГ Hack от Mindful Squirrel
Jupyter Notebook
6
star
16

Algorithms

Algorithms on C++ and C
C++
5
star
17

IlyaGusev

4
star
18

quest

Quantitative evalUation of modErn LLM Sampling Techniques
Python
3
star
19

MIPT_Algo_Seminars

Материалы для семинаров по курсу "Алгоритмы и структуры данных" ФПМИ МФТИ
HTML
3
star
20

translate_api

Python
2
star
21

aika

Amateur level C++ chess engine with web GUI on top of lc0 board representation
C++
2
star
22

SentiRuEval-2016

Jupyter Notebook
2
star
23

nlp-homework

Задание по курсу NLP
Jupyter Notebook
2
star
24

remotion

Эксперименты по аспектному анализу тональности
Jupyter Notebook
1
star
25

Plotter

Graph plotter, MathML and TeX support
C++
1
star