PASS: Pictures without humAns for Self-Supervised Pretraining
TL;DR: An ImageNet replacement dataset for self-supervised pretraining without humans
Content
PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information that can be used for high-quality pretraining while significantly reducing privacy concerns.
Download the dataset
The quickest way:
git clone https://github.com/yukimasano/PASS
cd PASS
source download.sh # maybe change the directory where you want to download it
Generally: all information is on our webpage.
For downloading the dataset, please visit our dataset on zenodo. There you can download it in tar files and find the meta-data.
You can also download the images from their AWS urls, from here.
Pretrained models
Pretraining | Method | Epochs | IN-1k Acc. | Places205 Acc. | |
---|---|---|---|---|---|
(IN-1k) | MoCo-v2 | 200 | 60.6 | 50.1 | visit MoCo-v2 repo |
PASS | MoCo-v2 | 180 | 59.1 | 52.8 | R50 weights |
PASS | MoCo-v2 | 200 | 59.5 | 52.8 | R50 weights |
PASS | MoCo-v2 | 800 | 61.2 | 54.0 | R50 weights |
PASS | MoCo-v2 (R18) | 800 | 45.3 | 44.4 | R18 weights |
PASS | MoCo-v2-CLD | 200 | 60.2 | 53.1 | R50 weights |
PASS | SwAV | 200 | 60.8 | 55.5 | R50 weights |
PASS | DINO | 100 | 61.3 | 54.6 | ViT S16 weights |
PASS | DINO | 300 | 65.0 | 55.7 | ViT S16 weights |
In the table above we give the download links to the full checkpoints (including momentum encoder etc.) to the models we've trained. For comparison, we include MoCo-v2 trained on ILSVRC-12 ("IN-1k") and report linear probing performance on IN-1k and Places205.
Pretrained models from PyTorch Hub
import torch
vits16_100ep = torch.hub.load('yukimasano/PASS:main', 'dino_100ep_vits16')
vits16 = torch.hub.load('yukimasano/PASS:main', 'dino_vits16')
r50_swav_200ep = torch.hub.load('yukimasano/PASS:main', 'swav_resnet50')
r50_moco_800ep = torch.hub.load('yukimasano/PASS:main', 'moco_resnet50')
r50_moco_cld_200ep = torch.hub.load('yukimasano/PASS:main', 'moco_cld_resnet50')
PASSify your dataset
In the folder PASSify of this repo, you can find automated scripts that try to remove humans from image datasets.
Contribute your models
Please let us know if you have a model pretrained on this dataset and I will add this to the list above.
Citation
@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
}