Transformers-ru

A list of pretrained Transformer models for the Russian language (including multilingual models).

Code for the model using and visualisation is from the following repos:

Models

There are models form:

Model description	# params	Config	Vocabulary	Model	BPE codes
BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters	170M	[huggingface] 1K	[huggingface] 973K	[huggingface] 682M
BERT-Base, Multilingual Uncased: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters	160M	[huggingface] 1K	[huggingface] 852K	[huggingface] 642M
RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters	170M			[deeppavlov] 636M
SlavicBERT, Slavic (bg, cs, pl, ru), cased, 12-layer, 768-hidden, 12-heads, 180M parameters	170M			[deeppavlov] 636M
XLM (MLM) 15 languages	237M	[huggingface] 1K	[huggingface] 2,9M [facebook] 1,5M	[huggingface] 1,3G [facebook] 1,3G	[huggingface] 1,4M [facebook] 1,4M
XLM (MLM+TLM) 15 languages	237M	[huggingface] 1K	[huggingface] 2,9M [facebook] 1,5M	[huggingface] 661M [facebook] 665M	[huggingface] 1,4M [facebook] 1,4M
XLM (MLM) 17 languages			[facebook] 2,9M	[facebook] 1,1G	[facebook] 2,9M
XLM (MLM) 100 languages			[facebook] 3,0M	[facebook] 1,1G	[facebook] 2,9M
Denis Antyukhov BERT-Base, Russian, Uncased, 12-layer, 768-hidden, 12-heads	176M			[bert_resourses] 1,9G
Facebook-FAIR's WMT'19 en-ru				[fairseq] 12G
Facebook-FAIR's WMT'19 ru-en				[fairseq] 12G
Facebook-FAIR's WMT'19 ru				[fairseq] 2,1G
Russian RuBERTa				[Google Drive] 247M

Converting TensorFlow models to PyTorch

Downloading and converting the DeepPavlov model:

$ wget 'http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz'
$ tar -xzf rubert_cased_L-12_H-768_A-12_v1.tar.gz
$ python3 convert_tf_checkpoint_to_pytorch.py \
    --tf_checkpoint_path rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt \
    --bert_config_file rubert_cased_L-12_H-768_A-12_v1/bert_config.json \
    --pytorch_dump_path rubert_cased_L-12_H-768_A-12_v1/bert_model.bin

Models comparison

There are scripts to train and evaluate models on the Sber SQuAD dataset for the russian language [download dataset].

Comparision of BERT models trained on the Sber SQuAD dataset:

Model	EM (dev)	F-1 (dev)
BERT-Base, Multilingual Cased	64.85	83.68
BERT-Base, Multilingual Uncased	64.73	83.25
RuBERT	66.38	84.58
SlavicBERT	65.23	83.68
RuBERTa-base	59.45	78.60

Visualization

The attention-head view visualization from BertViz:

[Notebook]

The model view visualization from BertViz:

[Notebook]

The neuron view visualization from BertViz:

[Notebook]

Generative models

GPT-2 models

Mikhail Grankin's model

Code: https://github.com/mgrankin/ru_transformers

Download models:

pip install awscli
aws s3 sync --no-sign-request s3://models.dobro.ai/gpt2/ru/unfreeze_all gpt2

vlarine/transformers-ru

vlarine

Reviews

Repository Details

Transformers-ru

Models

Converting TensorFlow models to PyTorch

Models comparison

Visualization

Generative models

GPT-2 models

Mikhail Grankin's model

Vladimir Larin's model

RNN Models

ELMo

DeepPavlov

RusVectōrēs

ULMFit

More Repositories