Bag of tricks for long-tailed visual recognition with deep convolutional neural networks
This repository is the official PyTorch implementation of Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.
-
Github Sort Content, which can sort the columns of tables listed on github. With Github Sort Content, you can easily find the most efficient trick under each dataset in trick_gallery.md. You can see this issue on stackoverflow for more information.
Recommond to install
Development log
-
2022-01-05
- Add DiVE ICCV 2021 in trick_gallery.md, which is a knowledge distillation method. -
2021-11-08
- Add InfluenceBalancedLoss ICCV 2021 in trick_gallery.md, which belongs to two-stage training. -
2021-05-19
- Add CONFIGs and experimental results of BBN-style sampling CVPR2020 in trick_gallery.md, which consists of a uniform sampler and a reverse sampler.
Previous logs
Trick gallery
Brief inroduction
We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.
Detailed information :
-
Trick gallery:
trick_gallery.md.
Tricks, corresponding results, experimental settings, and running commands are listed in
Main requirements
torch >= 1.4.0
torchvision >= 0.5.0
tensorboardX >= 2.1
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
Python 3
apex
- We provide the detailed requirements in requirements.txt. You can run
pip install requirements.txt
to create the same running environment as ours. - The apex is recommended to be installed for saving GPU memories:
pip install -U pip
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
- If the apex is not installed, the
Distributed training with DistributedDataParallel
in our codes cannot be used.
Preparing the datasets
We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).
The detailed information of these datasets are shown as follows:
Datasets | CIFAR-10-LT | CIFAR-100-LT | ImageNet-LT | iNat18 | ||
---|---|---|---|---|---|---|
Imbalance factor | ||||||
100 | 50 | 100 | 50 | |||
Training images | 12,406 | 13,996 | 10,847 | 12,608 | 11,5846 | 437,513 |
Classes | 50 | 50 | 100 | 100 | 1,000 | 8,142 |
Max images | 5,000 | 5,000 | 500 | 500 | 1,280 | 1,000 |
Min images | 50 | 100 | 5 | 10 | 5 | 2 |
Imbalance factor | 100 | 50 | 100 | 50 | 256 | 500 |
- Β CIFAR-10-LT-100
means the long-tailed CIFAR-10 dataset with the imbalance factor
- Β Imbalance factor
is defined as
-
Data format
The annotation of a dataset is a dict consisting of two field: annotations
and num_classes
.
The field annotations
is a list of dict with
image_id
, fpath
, im_height
, im_width
and category_id
.
Here is an example.
{
'annotations': [
{
'image_id': 1,
'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
'im_height': 600,
'im_width': 800,
'category_id': 7477
},
...
]
'num_classes': 8142
}
-
CIFAR-LT
Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019 's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.
-
ImageNet-LT
You can use the following steps to convert from the original images of ImageNet-LT.
- Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path
/downloaded/ImageNet/
, which should contain two sub-directories:/downloaded/ImageNet/train
and/downloaded/ImageNet/val
. - Download the train/test splitting files (
ImageNet_LT_train.txt
andImageNet_LT_test.txt
) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path/downloaded/ImageNet-LT/
. - Run tools/convert_from_ImageNet.py, and you will get two jsons:
ImageNet_LT_train.json
andImageNet_LT_val.json
.
# Convert from the original format of ImageNet-LT python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
- Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path
-
iNat18
You can use the following steps to convert from the original format of iNaturalist 2018.
- The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path
/downloaded/iNat18/
. - Run tools/convert_from_iNat.py, and use the generated
iNat18_train.json
andiNat18_val.json
to train.
# Convert from the original format of iNaturalist # See tools/convert_from_iNat.py for more details of args python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json
- The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path
Usage
In this repo:
-
The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.
-
The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.
-
If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.
Training
Parallel training with DataParallel
1, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50.
# `GPUs` are the GPUs you want to use, such as `0,4`.
bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs
Distributed training with DistributedDataParallel
1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name].
export NCCL_SOCKET_IFNAME = [your own socket name]
2, To train
# To train long-tailed CIFAR-10 with imbalanced ratio of 50.
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs
Validation
You can get the validation accuracy and the corresponding confusion matrix after running the following commands.
See main/valid.py for more details.
1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
2, To do validation
# `GPUs` are the GPUs you want to use, such as `0,1,4`.
python main/valid.py --cfg [Your yaml] --gpus GPUS
Cui, Kang]
The comparison between the baseline results using our codes and the references [- We use Top-1 error rates as our evaluation metric.
- For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
- You can click the
Baseline
in the table below to see the experimental settings and corresponding running commands.
Datasets | CIFAR-10-LT | CIFAR-100-LT | ImageNet-LT | iNat18 | ||
---|---|---|---|---|---|---|
Imbalance factor | ||||||
100 | 50 | 100 | 50 | |||
Backbones | ResNet-32 | ResNet-10 | ResNet-50 | |||
Baselines using our codes
| 28.05 | 23.55 | 62.27 | 56.22 | 63.74 | 40.55 |
Reference [Cui, Kang, Liu] | 29.64 | 25.19 | 61.68 | 56.15 | 64.40 | 42.86 |
Paper collection of long-tailed visual recognition
Awesome-of-Long-Tailed-Recognition
Long-Tailed-Classification-Leaderboard
Citation
@inproceedings{zhang2021tricks,
author = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
title = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
pages = {3447--3455},
booktitle = {AAAI},
year = {2021},
}
Contacts
If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.