• Stars
    star
    134
  • Rank 270,967 (Top 6 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 12 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CLIP implementation for Russian language

RuCLIP

Zero-shot image classification model for Russian language


RuCLIP (Russian Contrastive Language–Image Pretraining) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and multimodal learning. This repo has the prototypes model of OpenAI CLIP's Russian version following this paper.

Models

Installing

pip install ruclip==0.0.2

Usage

Open In Colab Standart RuCLIP API

Open In Colab RuCLIP + SberVqgan

Open In Colab ONNX example

Init models

import ruclip

device = 'cuda'
clip, processor = ruclip.load('ruclip-vit-base-patch32-384', device=device)

Zero-Shot Classification [Minimal Example]

import torch
import base64
import requests
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO

# prepare images
bs4_urls = requests.get('https://raw.githubusercontent.com/sberbank-ai/ru-dolph/master/pics/pipelines/cats_vs_dogs_bs4.json').json()
images = [Image.open(BytesIO(base64.b64decode(bs4_url))) for bs4_url in bs4_urls]

# prepare classes
classes = ['кошка', 'собака']
templates = ['{}', 'это {}', 'на картинке {}', 'это {}, домашнее животное']

# predict
predictor = ruclip.Predictor(clip, processor, device, bs=8, templates=templates)
with torch.no_grad():
    text_latents = predictor.get_text_latents(classes)
    pred_labels = predictor.run(images, text_latents)

# show results
f, ax = plt.subplots(2,4, figsize=(12,6))
for i, (pil_img, pred_label) in enumerate(zip(images, pred_labels)):
    ax[i//4, i%4].imshow(pil_img)
    ax[i//4, i%4].set_title(classes[pred_label])

Cosine similarity Visualization Example

Softmax Scores Visualization Example

Linear Probe and ZeroShot Correlation Results

Linear Probe Example

train = CIFAR100(root, download=True, train=True)
test = CIFAR100(root, download=True, train=False)

with torch.no_grad():
    X_train = predictor.get_image_latents((pil_img for pil_img, _ in train)).cpu().numpy()
    X_test = predictor.get_image_latents((pil_img for pil_img, _ in test)).cpu().numpy()
    y_train, y_test = np.array(train.targets), np.array(test.targets)

clf = LogisticRegression(solver='lbfgs', penalty='l2', max_iter=1000, verbose=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = np.mean((y_test == y_pred).astype(np.float)) * 100.
print(f"Accuracy = {accuracy:.3f}")

>>> Accuracy = 75.680

Performance

We have evaluated the performance zero-shot image classification on the following datasets:

Dataset ruCLIP Base [vit-base-patch32-224] ruCLIP Base [vit-base-patch16-224] ruCLIP Large [vit-large-patch14-224] ruCLIP Base [vit-base-patch32-384] ruCLIP Large [vit-large-patch14-336] ruCLIP Base [vit-base-patch16-384] CLIP [vit-base-patch16-224] original + OPUS-MT CLIP [vit-base-patch16-224] original
Food101, acc 0.505 0.552 0.597 0.642 0.712💥 0.689 0.664 0.883
CIFAR10, acc 0.818 0.810 0.878 0.862 0.906💥 0.845 0.859 0.893
CIFAR100, acc 0.504 0.496 0.511 0.529 0.591 0.569 0.603💥 0.647
Birdsnap, acc 0.115 0.117 0.172 0.161 0.213💥 0.195 0.126 0.396
SUN397, acc 0.452 0.462 0.484 0.510 0.523💥 0.521 0.447 0.631
Stanford Cars, acc 0.433 0.487 0.559 0.572 0.659💥 0.626 0.567 0.638
DTD, acc 0.380 0.401 0.370 0.390 0.408 0.421💥 0.243 0.432
MNIST, acc 0.447 0.464 0.337 0.404 0.242 0.478 0.559💥 0.559
STL10, acc 0.932 0.932 0.934 0.946 0.956 0.964 0.967💥 0.970
PCam, acc 0.501 0.505 0.520 0.506 0.554 0.501 0.603💥 0.573
CLEVR, acc 0.148 0.128 0.152 0.188 0.142 0.132 0.240💥 0.240
Rendered SST2, acc 0.489 0.527 0.529 0.508 0.539💥 0.525 0.484 0.484
ImageNet, acc 0.375 0.401 0.426 0.451 0.488💥 0.482 0.392 0.638
FGVC Aircraft, mean-per-class 0.033 0.043 0.046 0.053 0.075 0.046 0.220💥 0.244
Oxford Pets, mean-per-class 0.560 0.595 0.604 0.587 0.546 0.635💥 0.507 0.874
Caltech101, mean-per-class 0.786 0.775 0.777 0.834 0.835💥 0.835💥 0.792 0.883
Flowers102, mean-per-class 0.401 0.388 0.455 0.449 0.517💥 0.452 0.357 0.697
Hateful Memes, roc-auc 0.564 0.516 0.530 0.537 0.519 0.543 0.579💥 0.589

And for linear-prob evaluation:

Dataset ruCLIP Base [vit-base-patch32-224] ruCLIP Base [vit-base-patch16-224] ruCLIP Large [vit-large-patch14-224] ruCLIP Base [vit-base-patch32-384] ruCLIP Large [vit-large-patch14-336] ruCLIP Base [vit-base-patch16-384] CLIP [vit-base-patch16-224] original
Food101 0.765 0.827 0.840 0.851 0.896💥 0.890 0.901
CIFAR10 0.917 0.922 0.927 0.934 0.943💥 0.942 0.953
CIFAR100 0.716 0.739 0.734 0.745 0.770 0.773💥 0.808
Birdsnap 0.347 0.503 0.567 0.434 0.609 0.612💥 0.664
SUN397 0.683 0.721 0.731 0.721 0.759💥 0.758 0.777
Stanford Cars 0.697 0.776 0.797 0.766 0.831 0.840💥 0.866
DTD 0.690 0.734 0.711 0.703 0.731 0.749💥 0.770
MNIST 0.963 0.974💥 0.949 0.965 0.949 0.971 0.989
STL10 0.957 0.962 0.973 0.968 0.981💥 0.974 0.982
PCam 0.827 0.823 0.791 0.835 0.807 0.846💥 0.830
CLEVR 0.356 0.360 0.358 0.308 0.318 0.378💥 0.604
Rendered SST2 0.603 0.655 0.651 0.651 0.637 0.661💥 0.606
FGVC Aircraft 0.254 0.312 0.290 0.283 0.341 0.362💥 0.604
Oxford Pets 0.774 0.820 0.819 0.730 0.753 0.856💥 0.931
Caltech101 0.904 0.917 0.914 0.922 0.937💥 0.932 0.956
HatefulMemes 0.545 0.568 0.563 0.581 0.585💥 0.578 0.645

Also, we have created speed comparison based on CIFAR100 dataset using Nvidia-V100 for evaluation:

ruclip-vit-base-patch32-224 ruclip-vit-base-patch16-224 ruclip-vit-large-patch14-224 ruclip-vit-base-patch32-384 ruclip-vit-large-patch14-336 ruclip-vit-base-patch16-384
iter/sec 308.84 💥 155.35 49.95 147.26 22.11 61.79

Authors

Supported by

Social Media

More Repositories

1

Kandinsky-2

Kandinsky 2 — multilingual text2image latent diffusion model
Jupyter Notebook
2,745
star
2

ru-gpts

Russian GPT3 models.
Python
2,077
star
3

ru-dalle

Generate images from texts. In Russian
Jupyter Notebook
1,640
star
4

ghost

A new one shot face swap approach for image and video domains
Python
1,190
star
5

ner-bert

BERT-NER (nert-bert) with google bert https://github.com/google-research.
Jupyter Notebook
405
star
6

ru-dolph

RUDOLPH: One Hyper-Tasking Transformer can be creative as DALL-E and GPT-3 and smart as CLIP
Jupyter Notebook
242
star
7

Real-ESRGAN

PyTorch implementation of Real-ESRGAN model
Python
201
star
8

mgpt

Multilingual Generative Pretrained Model
Jupyter Notebook
196
star
9

KandinskyVideo

KandinskyVideo — multilingual end-to-end text2video latent diffusion model
Python
164
star
10

sage

SAGE: Spelling correction, corruption and evaluation for multiple languages
Jupyter Notebook
129
star
11

ruGPT3_demos

121
star
12

deforum-kandinsky

Kandinsky x Deforum — generating short animations
Python
102
star
13

digital_peter_aij2020

Materials of the AI Journey 2020 competition dedicated to the recognition of Peter the Great's manuscripts, https://ai-journey.ru/contest/task01
Jupyter Notebook
66
star
14

music-composer

Python
62
star
15

ru-prompts

Python
57
star
16

gigachat

Библиотека для доступа к GigaChat
Python
57
star
17

MERA

MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundamental models.
Jupyter Notebook
55
star
18

fusion_brain_aij2021

Creating multimodal multitask models
Jupyter Notebook
50
star
19

augmentex

Augmentex — a library for augmenting texts with errors
Python
48
star
20

model-zoo

NLP model zoo for Russian
45
star
21

OCR-model

An easy-to-run OCR model pipeline based on CRNN and CTC loss
Python
43
star
22

StackMix-OCR

Jupyter Notebook
40
star
23

MoVQGAN

MoVQGAN - model for the image encoding and reconstruction
Jupyter Notebook
35
star
24

tuned-vq-gan

Jupyter Notebook
28
star
25

ReadingPipeline

Text reading pipeline that combines segmentation and OCR-models.
Python
26
star
26

DataProcessingFramework

Framework for processing and filtering datasets
Python
25
star
27

htr_datasets

Repository containing our datasets for HTR (handwritten text recognition) task.
Jupyter Notebook
23
star
28

CerberusDet

CerberusDet: Unified Multi-Task Object Detection
Python
23
star
29

fbc3_aij2023

Jupyter Notebook
21
star
30

mineral-recognition

Python
20
star
31

DigiTeller

18
star
32

fbc2_aij2022

FusionBrain Challenge 2.0: creating multimodal multitask model
Python
16
star
33

combined_solution_aij2019

AI Journey 2019: Combined Solution
Python
15
star
34

SEGM-model

An easy-to-run semantic segmentation model based on Unet
Python
13
star
35

railway_infrastructure_detection_aij2021

AI Journey Contest 2021: AITrain
Python
13
star
36

no_fire_with_ai_aij2021

AI Journey Contest 2021: NoFireWithAI
Jupyter Notebook
13
star
37

ControlledNST

An implementation of Neural Style Transfer in PyTorch.
Jupyter Notebook
8
star
38

kandinsky3-diffusers

Python
6
star
39

mchs-wildfire

Соревнование по классификации лесных пожаров
Jupyter Notebook
4
star
40

no_flood_with_ai_aij2020

Материалы соревнования AI Journey 2020, посвященного прогнозированию паводков на реке Амур, https://ai-journey.ru/contest/task02
Jupyter Notebook
4
star
41

Zoom_In_Video_Kandinsky

Framework for creating Zoom in / Zoom out video based on inpainting Kandinsky
Jupyter Notebook
2
star
42

langchain-gigachat

Python
1
star
43

paper_persi_chat

PaperPersiChat: Scientific Paper Discussion Chatbot using Transformers and Discourse Flow Management
Jupyter Notebook
1
star