HBONet
Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.
We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.
Requirements
Dependencies
- PyTorch 1.0+
- NVIDIA-DALI (in development, not recommended)
Dataset
Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh
Pretrained models
The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.
HBONet with a spectrum of width multipliers (Table 2)
Architecture | MFLOPs | Top-1 / Top-5 Acc. (%) |
---|---|---|
HBONet 1.0 | 305 | 73.1 / 91.0 |
HBONet 0.8 | 205 | 71.3 / 89.7 |
HBONet 0.5 | 96 | 67.0 / 86.9 |
HBONet 0.35 | 61 | 62.4 / 83.7 |
HBONet 0.25 | 37 | 57.3 / 79.8 |
HBONet 0.1 | 14 | 41.5 / 65.7 |
HBONet 0.8 with a spectrum of input resolutions (Table 3)
Architecture | MFLOPs | Top-1 / Top-5 Acc. (%) |
---|---|---|
HBONet 0.8 224x224 | 205 | 71.3 / 89.7 |
HBONet 0.8 192x192 | 150 | 70.0 / 89.2 |
HBONet 0.8 160x160 | 105 | 68.3 / 87.8 |
HBONet 0.8 128x128 | 68 | 65.5 / 85.9 |
HBONet 0.8 96x96 | 39 | 61.4 / 83.0 |
HBONet 0.35 with a spectrum of input resolutions (Table 4)
Architecture | MFLOPs | Top-1 / Top-5 Acc. (%) |
---|---|---|
HBONet 0.35 224x224 | 61 | 62.4 / 83.7 |
HBONet 0.35 192x192 | 45 | 60.9 / 82.6 |
HBONet 0.35 160x160 | 31 | 58.6 / 80.7 |
HBONet 0.35 128x128 | 21 | 55.2 / 78.0 |
HBONet 0.35 96x96 | 12 | 50.3 / 73.8 |
HBONet with different width multipliers and different input resolutions (Table 5)
Architecture | MFLOPs | Top-1 / Top-5 Acc. (%) |
---|---|---|
HBONet 0.5 224x224 | 98 | 67.7 / 87.4 |
HBONet 0.6 192x192 | 108 | 67.3 / 87.3 |
HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)
Architecture | MFLOPs | Top-1 / Top-5 Acc. (%) |
---|---|---|
HBONet(2x) 0.25 | 44 | 58.3 / 80.6 |
HBONet(4x) 0.25 | 45 | 59.3 / 81.4 |
HBONet(8x) 0.25 | 45 | 58.2 / 80.4 |
Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)
from models.imagenet import hbonet
net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))
Usage
Training
Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.
- batch size 256
- epoch 150
- learning rate 0.05
- LR decay strategy cosine
- weight decay 0.00004
python imagenet.py \
-a hbonet \
-d <path-to-ILSVRC2012-data> \
--epochs 150 \
--lr-decay cos \
--lr 0.05 \
--wd 4e-5 \
-c <path-to-save-checkpoints> \
--width-mult <width-multiplier> \
--input-size <input-resolution> \
-j <num-workers>
Test
python imagenet.py \
-a hbonet \
-d <path-to-ILSVRC2012-data> \
--weight <pretrained-pth-file> \
--width-mult <width-multiplier> \
--input-size <input-resolution> \
-e
Citations
If you find our work useful in your research, please consider citing:
@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}