BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning
Introduction
This is the official PyTorch implementation of BatchFormer for Long-Tailed Recognition, Domain Generalization, Compositional Zero-Shot Learning, Contrastive Learning.
Sample Relationship Exploration for Robust Representation Learning
Please also refer to BatchFormerV2, in which we introduce a BatchFormerV2 module for vision Transformers.
Main Results
Long-Tailed Recognition
ImageNet-LT
All(R10) | Many(R10) | Med(R10) | Few(R10) | All(R50) | Many(R50) | Med(R50) | Few(R50) | |
---|---|---|---|---|---|---|---|---|
RIDE(3 experts)[1] | 44.7 | 57.0 | 40.3 | 25.5 | 53.6 | 64.9 | 50.4 | 33.2 |
+BatchFormer | 45.7 | 56.3 | 42.1 | 28.3 | 54.1 | 64.3 | 51.4 | 35.1 |
PaCo[2] | - | - | - | - | 57.0 | 64.8 | 55.9 | 39.1 |
+BatchFormer | - | - | - | - | 57.4 | 62.7 | 56.7 | 42.1 |
Here, we demonstrate the result on one-stage RIDE (ResNext-50)
All | Many | Medium | Few | |
---|---|---|---|---|
RIDE(3 experts)* | 55.9 | 67.3 | 52.8 | 34.6 |
+BatchFormer | 56.5 | 66.6 | 54.2 | 36.0 |
iNaturalist 2018
All | Many | Medium | Few | |
---|---|---|---|---|
RIDE(3 experts) | 72.5 | 68.1 | 72.7 | 73.2 |
+BatchFormer | 74.1 | 65.5 | 74.5 | 75.8 |
Object Detection (V2)
AP | AP50 | AP75 | APS | APM | APL | Model | |
---|---|---|---|---|---|---|---|
DETR | 34.8 | 55.6 | 35.8 | 14.0 | 37.2 | 54.6 | |
+BatchFormerV2 | 36.9 | 57.9 | 38.5 | 15.6 | 40.0 | 55.9 | download |
Conditional DETR | 40.9 | 61.8 | 43.3 | 20.8 | 44.6 | 59.2 | |
+BatchFormerV2 | 42.3 | 63.2 | 45.1 | 21.9 | 46.0 | 60.7 | download |
Deformable DETR | 43.8 | 62.6 | 47.7 | 26.4 | 47.1 | 58.0 | |
+BatchFormerV2 | 45.5 | 64.3 | 49.8 | 28.3 | 48.6 | 59.4 | download |
The backbone is ResNet-50. The training epoch is 50.
Panoptic segmentation (V2)
PQ | SQ | RQ | PQ(th) | SQ(th) | RQ(th) | PQ(st) | SQ(st) | RQ(st) | AP | |
---|---|---|---|---|---|---|---|---|---|---|
DETR | 43.4 | 79.3 | 53.8 | 48.2 | 79.8 | 59.5 | 36.3 | 78.5 | 45.3 | 31.1 |
+BatchFormerV2 | 45.1 | 80.3 | 55.3 | 50.5 | 81.1 | 61.5 | 37.1 | 79.1 | 46.0 | 33.4 |
Contrastive Learning
Epochs | Top-1 | Pretrained | |
---|---|---|---|
MoCo-v2[3] | 200 | 67.5 | |
+BatchFormer | 200 | 68.4 | download |
MoCo-v3[4] | 100 | 68.9 | |
+BatchFormer | 100 | 70.1 | download |
Here, we provide the pretrained MoCo-V3 model corresponding to this strategy.
Domain Generalization
ResNet-18
PACS | VLCS | OfficeHome | Terra | |
---|---|---|---|---|
SWAD[5] | 82.9 | 76.3 | 62.1 | 42.1 |
+BatchFormer | 83.7 | 76.9 | 64.3 | 44.8 |
Compositional Zero-Shot Learning
MIT-States(AUC) | MIT-States(HM) | UT-Zap50K(AUC) | UT-Zap50K(HM) | C-GQA(AUC) | C-GQA(HM) | |
---|---|---|---|---|---|---|
CGE*[6] | 6.3 | 20.0 | 31.5 | 46.5 | 3.7 | 14.9 |
+BatchFormer | 6.7 | 20.6 | 34.6 | 49.0 | 3.8 | 15.5 |
Few-Shot Learning
Experiments on CUB.
Unseen | Seen | Harmonic mean | |
---|---|---|---|
CUB[7]* | 67.5 | 65.1 | 66.3 |
+BatchFormer | 68.2 | 65.8 | 67.0 |
Image Classification (V2)
Top-1 | Top-5 | |
---|---|---|
DeiT-T | 72.2 | 91.1 |
+BatchFormerV2 | 72.7 | 91.5 |
DeiT-S | 79.8 | 95.0 |
+BatchFormerV2 | 80.4 | 95.2 |
DeiT-B | 81.7 | 95.5 |
+BatchFormerV2 | 82.2 | 95.8 |
Reference
- Long-tailed recognition by routing diverse distribution-aware experts. In ICLR, 2021
- Parametric contrastive learning. In ICCV, 2021
- Improved baselines with momentum contrastive learning.
- An empirical study of training self-supervised vision transformers. In CVPR, 2021
- Domain generalization by seeking flat minima. In NeurIPS, 2021.
- Learning graph embeddings for compositional zero-shot learning. In CVPR, 2021
- Contrastive learning based hybrid networks for long- tailed image classification. In CVPR, 2021
PyTorch Code
The proposed BatchFormer can be implemented with a few lines as follows,
def BatchFormer(x, y, encoder, is_training):
# x: input features with the shape [N, C]
# encoder: TransformerEncoderLayer(C,4,C,0.5)
if not is_training:
return x, y
pre_x = x
x = encoder(x.unsqueeze(1)).squeeze(1)
x = torch.cat([pre_x, x], dim=0)
y = torch.cat([y, y], dim=0)
return x, y
Citation
If you find this repository helpful, please consider cite:
@inproceedings{hou2022batch,
title={BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning},
author={Hou, Zhi and Yu, Baosheng and Tao, Dacheng},
booktitle={CVPR},
year={2022}
}
@article{hou2022batchformerv2,
title={BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning},
author={Hou, Zhi and Yu, Baosheng and Wang, Chaoyue and Zhan, Yibing and Tao, Dacheng},
journal={arXiv preprint arXiv:2204.01254},
year={2022}
}
Feel free to contact "zhou9878 at uni dot sydney dot edu dot au" if you have any questions.