Glasses 😎

Compact, concise and customizable deep learning computer vision library

Models have been stored into the hugging face hub!

Doc is here

TL;DR

This library has

human readable code, no research code
common component are shared across models
same APIs for all models (you learn them once and they are always the same)
clear and easy to use model constomization (see here)
classification and segmentation
emoji in the name ;)

Stuff implemented so far:

Installation

You can install glasses using pip by running

pip install git+https://github.com/FrancescoSaverioZuppichini/glasses

Motivations

Almost all existing implementations of the most famous model are written with very bad coding practices, what today is called research code. I struggled to understand some of the implementations even if in the end were just a few lines of code.

Most of them are missing a global structure, they used tons of code repetition, they are not easily customizable and not tested. Since I do computer vision for living, I needed a way to make my life easier.

Getting started

The API are shared across all models!

import torch
from glasses.models import AutoModel, AutoTransform
# load one model
model = AutoModel.from_pretrained('resnet18').eval()
# and its correct input transformation
tr = AutoTransform.from_name('resnet18')
model.summary(device='cpu' ) # thanks to torchinfo

# at any time, see all the models
AutoModel.models_table()

            Models                 
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Name                   ┃ Pretrained ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ resnet18               │ true       │
│ resnet26               │ true       │
│ resnet26d              │ true       │
│ resnet34               │ true       │
│ resnet34d              │ true       │
│ resnet50               │ true       │
...

Interpretability

import requests
from PIL import Image
from io import BytesIO
from glasses.interpretability import GradCam, SaliencyMap
from torchvision.transforms import Normalize
# get a cute dog 🐶
r = requests.get('https://i.insider.com/5df126b679d7570ad2044f3e?width=700&format=jpeg&auto=webp')
im = Image.open(BytesIO(r.content))
# un-normalize when done
mean, std = tr.transforms[-1].mean, tr.transforms[-1].std
postprocessing = Normalize(-mean / std, (1.0 / std))
# apply preprocessing
x =  tr(im).unsqueeze(0)
_ = model.interpret(x, using=GradCam(), postprocessing=postprocessing).show()

Classification

from glasses.models import ResNet
from torch import nn
# change activation
model = AutoModel.from_pretrained('resnet18', activation = nn.SELU).eval()
# or directly from the model class
ResNet.resnet18(activation = nn.SELU)
# change number of classes
ResNet.resnet18(n_classes=100)
# freeze only the convolution weights
model = AutoModel.from_pretrained('resnet18')
model.freeze(who=model.encoder)

Get the inner features

# model.encoder has special hooks ready to be activated
# call the .features to trigger them
model.encoder.features
x = torch.randn((1, 3, 224, 224))
model(x)
[f.shape for f in model.encoder.features]

Change inner block

# what about resnet with inverted residuals?
from glasses.models.classification.efficientnet import InvertedResidualBlock
ResNet.resnet18(block = InvertedResidualBlock)

Segmentation

from functools import partial
from glasses.models.segmentation.unet import UNet, UNetDecoder
# vanilla Unet
unet = UNet()
# let's change the encoder
unet = UNet.from_encoder(partial(AutoModel.from_name, 'efficientnet_b1'))
# mmm I want more layers in the decoder!
unet = UNet(decoder=partial(UNetDecoder, widths=[256, 128, 64, 32, 16]))
# maybe resnet was better
unet = UNet(encoder=lambda **kwargs: ResNet.resnet26(**kwargs).encoder)
# same API
# unet.summary(input_shape=(1,224,224))

unet

More examples

# change the decoder part
model = AutoModel.from_pretrained('resnet18')
my_head = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(),
    nn.Linear(model.encoder.widths[-1], 512),
    nn.Dropout(0.2),
    nn.ReLU(),
    nn.Linear(512, 1000))

model.head = my_head

x = torch.rand((1,3,224,224))
model(x).shape #torch.Size([1, 1000])

Pretrained Models

I am currently working on the pretrained models and the best way to make them available

This is a list of all the pretrained models available so far!. They are all trained on ImageNet.

I used a batch_size=64 and a GTX 1080ti to evaluale the models.

	top1	top5	time	batch_size
vit_base_patch16_384	0.842	0.9722	1130.81	64
vit_large_patch16_224	0.82836	0.96406	893.486	64
eca_resnet50t	0.82234	0.96172	241.754	64
eca_resnet101d	0.82166	0.96052	213.632	64
efficientnet_b3	0.82034	0.9603	199.599	64
regnety_032	0.81958	0.95964	136.518	64
vit_base_patch32_384	0.8166	0.9613	243.234	64
vit_base_patch16_224	0.815	0.96018	306.686	64
deit_small_patch16_224	0.81082	0.95316	132.868	64
eca_resnet50d	0.80604	0.95322	135.567	64
resnet50d	0.80492	0.95128	97.5827	64
cse_resnet50	0.80292	0.95048	108.765	64
efficientnet_b2	0.80126	0.95124	127.177	64
eca_resnet26t	0.79862	0.95084	155.396	64
regnety_064	0.79712	0.94774	183.065	64
regnety_040	0.79222	0.94656	124.881	64
resnext101_32x8d	0.7921	0.94556	290.38	64
regnetx_064	0.79066	0.94456	176.3	64
wide_resnet101_2	0.7891	0.94344	277.755	64
regnetx_040	0.78486	0.94242	122.619	64
wide_resnet50_2	0.78464	0.94064	201.634	64
efficientnet_b1	0.7831	0.94096	98.7143	64
resnet152	0.7825	0.93982	186.191	64
regnetx_032	0.7792	0.93996	319.558	64
resnext50_32x4d	0.77628	0.9368	114.325	64
regnety_016	0.77604	0.93702	96.547	64
efficientnet_b0	0.77332	0.93566	67.2147	64
resnet101	0.77314	0.93556	134.148	64
densenet161	0.77146	0.93602	239.388	64
resnet34d	0.77118	0.93418	59.9938	64
densenet201	0.76932	0.9339	158.514	64
regnetx_016	0.76684	0.9328	91.7536	64
resnet26d	0.766	0.93188	70.6453	64
regnety_008	0.76238	0.93026	54.1286	64
resnet50	0.76012	0.92934	89.7976	64
densenet169	0.75628	0.9281	127.077	64
resnet26	0.75394	0.92584	65.5801	64
resnet34	0.75096	0.92246	56.8985	64
regnety_006	0.75068	0.92474	55.5611	64
regnetx_008	0.74788	0.92194	57.9559	64
densenet121	0.74472	0.91974	104.13	64
deit_tiny_patch16_224	0.7437	0.91898	66.662	64
vgg19_bn	0.74216	0.91848	169.357	64
regnety_004	0.73766	0.91638	68.4893	64
regnetx_006	0.73682	0.91568	81.4703	64
vgg16_bn	0.73476	0.91536	150.317	64
vgg19	0.7236	0.9085	155.851	64
regnetx_004	0.72298	0.90644	58.0049	64
vgg16	0.71628	0.90368	135.398	64
vgg13_bn	0.71618	0.9036	129.077	64
efficientnet_lite0	0.7041	0.89894	62.4211	64
vgg11_bn	0.70408	0.89724	86.9459	64
vgg13	0.69984	0.89306	116.052	64
regnety_002	0.6998	0.89422	46.804	64
resnet18	0.69644	0.88982	46.2029	64
vgg11	0.68872	0.88658	79.4136	64
regnetx_002	0.68658	0.88244	45.9211	64

Assuming you want to load efficientnet_b1:

from glasses.models import EfficientNet, AutoModel, AutoTransform

# load it using AutoModel
model = AutoModel.from_pretrained('efficientnet_b1').eval()
# or from its own class
model = EfficientNet.efficientnet_b1(pretrained=True)
# you may also need to get the correct transformation that must be applied on the input
tr = AutoTransform.from_name('efficientnet_b1')

In this case, tr is

Compose(
    Resize(size=240, interpolation=PIL.Image.BICUBIC)
    CenterCrop(size=(240, 240))
    ToTensor()
    Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)

Deep Customization

All models are composed by sharable parts:

Block
Layer
Encoder
Head
Decoder

Block

Each model has its building block, they are noted by *Block. In each block, all the weights are in the .block field. This makes it very easy to customize one specific model.

from glasses.models.classification.vgg import VGGBasicBlock
from glasses.models.classification.resnet import ResNetBasicBlock, ResNetBottleneckBlock, ResNetBasicPreActBlock, ResNetBottleneckPreActBlock
from glasses.models.classification.senet import SENetBasicBlock, SENetBottleneckBlock
from glasses.models.classification.resnetxt import ResNetXtBottleNeckBlock
from glasses.models.classification.densenet import DenseBottleNeckBlock
from glasses.models.classification.wide_resnet import WideResNetBottleNeckBlock
from glasses.models.classification.efficientnet import EfficientNetBasicBlock

For example, if we want to add Squeeze and Excitation to the resnet bottleneck block, we can just

from glasses.nn.att import SpatialSE
from  glasses.models.classification.resnet import ResNetBottleneckBlock

class SEResNetBottleneckBlock(ResNetBottleneckBlock):
    def __init__(self, in_features: int, out_features: int, squeeze: int = 16, *args, **kwargs):
        super().__init__(in_features, out_features, *args, **kwargs)
        # all the weights are in block, we want to apply se after the weights
        self.block.add_module('se', SpatialSE(out_features, reduction=squeeze))
        
SEResNetBottleneckBlock(32, 64)

Then, we can use the class methods to create the new models following the existing architecture blueprint, for example, to create se_resnet50

ResNet.resnet50(block=ResNetBottleneckBlock)

The cool thing is each model has the same api, if I want to create a vgg13 with the ResNetBottleneckBlock I can just

from glasses.models import VGG
model = VGG.vgg13(block=SEResNetBottleneckBlock)
model.summary()

Some specific model can require additional parameter to the block, for example MobileNetV2 also required a expansion parameter so our SEResNetBottleneckBlock won't work.

Layer

A Layer is a collection of blocks, it is used to stack multiple blocks together following some logic. For example, ResNetLayer

from glasses.models.classification.resnet import ResNetLayer

ResNetLayer(64, 128, depth=2)

Encoder

The encoder is what encoders a vector, so the convolution layers. It has always two very important parameters.

widths
depths

widths is the wide at each layer, so how much features there are depths is the depth at each layer, so how many blocks there are

For example, ResNetEncoder will creates multiple ResNetLayer based on the len of widths and depths. Let's see some example.

from glasses.models.classification.resnet import ResNetEncoder
# 3 layers, with 32,64,128 features and 1,2,3 block each
ResNetEncoder(
    widths=[32,64,128],
    depths=[1,2,3])

All encoders are subclass of Encoder that allows us to hook on specific stages to get the featuers. All you have to do is first call .features to notify the model you want to receive the features, and then pass an input.

enc = ResNetEncoder()
enc.features
enc(torch.randn((1,3,224,224)))
print([f.shape for f in enc.features])

Remember each model has always a .encoder field

from glasses.models import ResNet

model = ResNet.resnet18()
model.encoder.widths[-1]

The encoder knows the number of output features, you can access them by

Features

Each encoder can return a list of features accessable by the .features field. You need to call it once before in order to notify the encoder we wish to also store the features

from glasses.models.classification.resnet import ResNetEncoder

x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
enc(x)
features = enc.features # now we have all the features from each layer (stage)
[print(f.shape) for f in features]
# torch.Size([1, 64, 112, 112])
# torch.Size([1, 64, 56, 56])
# torch.Size([1, 128, 28, 28])
# torch.Size([1, 256, 14, 14])

Head

Head is the last part of the model, it usually perform the classification

from glasses.models.classification.resnet import ResNetHead


ResNetHead(512, n_classes=1000)

Decoder

The decoder takes the last feature from the .encoder and decode it. This is usually done in segmentation models, such as Unet.

from glasses.models.segmentation.unet import UNetDecoder
x = torch.randn(1,3,224,224)
enc = ResNetEncoder()
enc.features # call it once
x = enc(x)
features = enc.features
# we need to tell the decoder the first feature size and the size of the lateral features
dec = UNetDecoder(start_features=enc.widths[-1],
                  lateral_widths=enc.features_widths[::-1])
out = dec(x, features[::-1])
out.shape

This object oriented structure allows to reuse most of the code across the models

name	Parameters	Size (MB)
cse_resnet101	49,326,872	188.17
cse_resnet152	66,821,848	254.91
cse_resnet18	11,778,592	44.93
cse_resnet34	21,958,868	83.77
cse_resnet50	28,088,024	107.15
deit_base_patch16_224	87,184,592	332.58
deit_base_patch16_384	87,186,128	357.63
deit_small_patch16_224	22,359,632	85.3
deit_tiny_patch16_224	5,872,400	22.4
densenet121	7,978,856	30.44
densenet161	28,681,000	109.41
densenet169	14,149,480	53.98
densenet201	20,013,928	76.35
eca_resnet101d	44,568,563	212.62
eca_resnet101t	44,566,027	228.65
eca_resnet18d	16,014,452	98.41
eca_resnet18t	1,415,684	37.91
eca_resnet26d	16,014,452	98.41
eca_resnet26t	16,011,916	114.44
eca_resnet50d	25,576,350	136.65
eca_resnet50t	25,573,814	152.68
efficientnet_b0	5,288,548	20.17
efficientnet_b1	7,794,184	29.73
efficientnet_b2	9,109,994	34.75
efficientnet_b3	12,233,232	46.67
efficientnet_b4	19,341,616	73.78
efficientnet_b5	30,389,784	115.93
efficientnet_b6	43,040,704	164.19
efficientnet_b7	66,347,960	253.1
efficientnet_b8	87,413,142	505.01
efficientnet_l2	480,309,308	2332.13
efficientnet_lite0	4,652,008	17.75
efficientnet_lite1	5,416,680	20.66
efficientnet_lite2	6,092,072	23.24
efficientnet_lite3	8,197,096	31.27
efficientnet_lite4	13,006,568	49.62
fishnet150	24,960,808	95.22
fishnet99	16,630,312	63.44
mobilenet_v2	3,504,872	24.51
mobilenetv2	3,504,872	13.37
regnetx_002	2,684,792	10.24
regnetx_004	5,157,512	19.67
regnetx_006	6,196,040	23.64
regnetx_008	7,259,656	27.69
regnetx_016	9,190,136	35.06
regnetx_032	15,296,552	58.35
regnetx_040	22,118,248	97.66
regnetx_064	26,209,256	114.02
regnetx_080	34,561,448	147.43
regnety_002	3,162,996	12.07
regnety_004	4,344,144	16.57
regnety_006	6,055,160	23.1
regnety_008	6,263,168	23.89
regnety_016	11,202,430	42.73
regnety_032	19,436,338	74.14
regnety_040	20,646,656	91.77
regnety_064	30,583,252	131.52
regnety_080	39,180,068	165.9
resnest101e	48,275,016	184.15
resnest14d	10,611,688	40.48
resnest200e	70,201,544	267.8
resnest269e	7,551,112	28.81
resnest26d	17,069,448	65.11
resnest50d	27,483,240	104.84
resnest50d_1s4x24d	25,677,000	97.95
resnest50d_4s2x40d	30,417,592	116.03
resnet101	44,549,160	169.94
resnet152	60,192,808	229.62
resnet18	11,689,512	44.59
resnet200	64,673,832	246.71
resnet26	15,995,176	61.02
resnet26d	16,014,408	61.09
resnet34	21,797,672	83.15
resnet34d	21,816,904	83.22
resnet50	25,557,032	97.49
resnet50d	25,576,264	97.57
resnext101_32x16d	194,026,792	740.15
resnext101_32x32d	468,530,472	1787.3
resnext101_32x48d	828,411,176	3160.14
resnext101_32x8d	88,791,336	338.71
resnext50_32x4d	25,028,904	95.48
se_resnet101	49,292,328	188.04
se_resnet152	66,770,984	254.71
se_resnet18	11,776,552	44.92
se_resnet34	21,954,856	83.75
se_resnet50	28,071,976	107.09
unet	23,202,530	88.51
vgg11	132,863,336	506.83
vgg11_bn	132,868,840	506.85
vgg13	133,047,848	507.54
vgg13_bn	133,053,736	507.56
vgg16	138,357,544	527.79
vgg16_bn	138,365,992	527.82
vgg19	143,667,240	548.05
vgg19_bn	143,678,248	548.09
vit_base_patch16_224	86,415,592	329.65
vit_base_patch16_384	86,415,592	329.65
vit_base_patch32_384	88,185,064	336.4
vit_huge_patch16_224	631,823,080	2410.21
vit_huge_patch32_384	634,772,200	2421.46
vit_large_patch16_224	304,123,880	1160.14
vit_large_patch16_384	304,123,880	1160.14
vit_large_patch32_384	306,483,176	1169.14
vit_small_patch16_224	48,602,344	185.4
wide_resnet101_2	126,886,696	484.03
wide_resnet50_2	68,883,240	262.77

Credits

Most of the weights were trained by other people and adapted to glasses. It is worth cite

pytorch-image-models
torchvision

FrancescoSaverioZuppichini/glasses

FrancescoSaverioZuppichini

Reviews

Repository Details