HBONet

Official implementation of our HBONet architecture as described in HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions (ICCV'19) by Duo Li, Aojun Zhou and Anbang Yao on ILSVRC2012 benchmark with PyTorch framework.

We integrate our HBO modules into the state-of-the-art MobileNetV2 backbone as a reference case. Baseline models of MobileNetV2 counterparts are available in my repository mobilenetv2.pytorch.

Requirements

Dependencies

PyTorch 1.0+
NVIDIA-DALI (in development, not recommended)

Dataset

Download the ImageNet dataset and move validation images to labeled subfolders. To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh

Pretrained models

The following statistics are reported on the ILSVRC2012 validation set with single center crop testing.

HBONet with a spectrum of width multipliers (Table 2)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 1.0	305	73.1 / 91.0
HBONet 0.8	205	71.3 / 89.7
HBONet 0.5	96	67.0 / 86.9
HBONet 0.35	61	62.4 / 83.7
HBONet 0.25	37	57.3 / 79.8
HBONet 0.1	14	41.5 / 65.7

HBONet 0.8 with a spectrum of input resolutions (Table 3)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.8 224x224	205	71.3 / 89.7
HBONet 0.8 192x192	150	70.0 / 89.2
HBONet 0.8 160x160	105	68.3 / 87.8
HBONet 0.8 128x128	68	65.5 / 85.9
HBONet 0.8 96x96	39	61.4 / 83.0

HBONet 0.35 with a spectrum of input resolutions (Table 4)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.35 224x224	61	62.4 / 83.7
HBONet 0.35 192x192	45	60.9 / 82.6
HBONet 0.35 160x160	31	58.6 / 80.7
HBONet 0.35 128x128	21	55.2 / 78.0
HBONet 0.35 96x96	12	50.3 / 73.8

HBONet with different width multipliers and different input resolutions (Table 5)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet 0.5 224x224	98	67.7 / 87.4
HBONet 0.6 192x192	108	67.3 / 87.3

HBONet 0.25 variants with different down-sampling and up-sampling rates (Table 6)

Architecture	MFLOPs	Top-1 / Top-5 Acc. (%)
HBONet(2x) 0.25	44	58.3 / 80.6
HBONet(4x) 0.25	45	59.3 / 81.4
HBONet(8x) 0.25	45	58.2 / 80.4

Taking HBONet 1.0 as an example, pretrained models can be easily imported using the following lines and then finetuned for other vision tasks or utilized in resource-aware platforms. (To create variant models in Table 5 & 6, it is necessary to make slight modifications following the instructions in the docstrings of the model file in advance.)

from models.imagenet import hbonet

net = hbonet()
net.load_state_dict(torch.load('pretrained/hbonet_1_0.pth'))

Usage

Training

Configuration to reproduce our reported results, totally the same as mobilenetv2.pytorch for fair comparison.

batch size 256
epoch 150
learning rate 0.05
LR decay strategy cosine
weight decay 0.00004

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --epochs 150 \
    --lr-decay cos \
    --lr 0.05 \
    --wd 4e-5 \
    -c <path-to-save-checkpoints> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -j <num-workers>

Test

python imagenet.py \
    -a hbonet \
    -d <path-to-ILSVRC2012-data> \
    --weight <pretrained-pth-file> \
    --width-mult <width-multiplier> \
    --input-size <input-resolution> \
    -e

Citations

If you find our work useful in your research, please consider citing:

@InProceedings{Li_2019_ICCV,
author = {Li, Duo and Zhou, Aojun and Yao, Anbang},
title = {HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}

d-li14/HBONet

d-li14

Reviews

Repository Details