Transformers-ru
A list of pretrained Transformer models for the Russian language (including multilingual models).
Code for the model using and visualisation is from the following repos:
Models
There are models form:
- DeepPavlov project
- Hugging Face repository
- Facebook research
- Facebook's fairseq
- Denis Antyukhov Google Colab code
- Russian RuBERTa
Model description | # params | Config | Vocabulary | Model | BPE codes |
---|---|---|---|---|---|
BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters | 170M | [huggingface] 1K | [huggingface] 973K | [huggingface] 682M | |
BERT-Base, Multilingual Uncased: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters | 160M | [huggingface] 1K | [huggingface] 852K | [huggingface] 642M | |
RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters | 170M | [deeppavlov] 636M | |||
SlavicBERT, Slavic (bg, cs, pl, ru), cased, 12-layer, 768-hidden, 12-heads, 180M parameters | 170M | [deeppavlov] 636M | |||
XLM (MLM) 15 languages | 237M | [huggingface] 1K | [huggingface] 2,9M [facebook] 1,5M |
[huggingface] 1,3G [facebook] 1,3G |
[huggingface] 1,4M [facebook] 1,4M |
XLM (MLM+TLM) 15 languages | 237M | [huggingface] 1K | [huggingface] 2,9M [facebook] 1,5M |
[huggingface] 661M [facebook] 665M |
[huggingface] 1,4M [facebook] 1,4M |
XLM (MLM) 17 languages | [facebook] 2,9M | [facebook] 1,1G | [facebook] 2,9M | ||
XLM (MLM) 100 languages | [facebook] 3,0M | [facebook] 1,1G | [facebook] 2,9M | ||
Denis Antyukhov BERT-Base, Russian, Uncased, 12-layer, 768-hidden, 12-heads | 176M | [bert_resourses] 1,9G | |||
Facebook-FAIR's WMT'19 en-ru | [fairseq] 12G | ||||
Facebook-FAIR's WMT'19 ru-en | [fairseq] 12G | ||||
Facebook-FAIR's WMT'19 ru | [fairseq] 2,1G | ||||
Russian RuBERTa | [Google Drive] 247M |
Converting TensorFlow models to PyTorch
Downloading and converting the DeepPavlov model:
$ wget 'http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz'
$ tar -xzf rubert_cased_L-12_H-768_A-12_v1.tar.gz
$ python3 convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt \
--bert_config_file rubert_cased_L-12_H-768_A-12_v1/bert_config.json \
--pytorch_dump_path rubert_cased_L-12_H-768_A-12_v1/bert_model.bin
Models comparison
There are scripts to train and evaluate models on the Sber SQuAD dataset for the russian language [download dataset].
Comparision of BERT models trained on the Sber SQuAD dataset:
Model | EM (dev) | F-1 (dev) |
---|---|---|
BERT-Base, Multilingual Cased | 64.85 | 83.68 |
BERT-Base, Multilingual Uncased | 64.73 | 83.25 |
RuBERT | 66.38 | 84.58 |
SlavicBERT | 65.23 | 83.68 |
RuBERTa-base | 59.45 | 78.60 |
Visualization
The attention-head view visualization from BertViz:
The model view visualization from BertViz:
The neuron view visualization from BertViz:
Generative models
GPT-2 models
Mikhail Grankin's model
Code: https://github.com/mgrankin/ru_transformers
Download models:
pip install awscli
aws s3 sync --no-sign-request s3://models.dobro.ai/gpt2/ru/unfreeze_all gpt2
Vladimir Larin's model
- Code: https://github.com/vlarine/ruGPT2
- Model: gpt2_345m.tgz (4,2G)
RNN Models
There are some RNN models for russian language.
ELMo
DeepPavlov
- ELMo on Russian Wikipedia: [config], [model]
- ELMo on Russian WMT News: [config], [model]
- ELMo on Russian Twitter: [config], [model]
RusVectōrēs
- RNC and Wikipedia. December 2018 (tokens): [model]
- RNC and Wikipedia. December 2018 (lemmas): [model]
- Taiga 2048. December 2019 (lemmas): [model]