researchmm/DBTNet

Stars
105
Rank 328,196 (Top 7 %)
Language
Python
Created about 5 years ago
Updated almost 5 years ago

researchmm/DBTNet

researchmm

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"

DBTNet

MXNet version of the code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"

Bilinear feature transformation has shown the state-of-the-art performance in learning fine-grained image representations. The proposed DBTNet can deeply integrate bilinear features into CNN to learn fine-grained image representations.

Framework

Main Results

Method	Dimension	CUB-200-2011	Stanford-Car	Aircraft
Compact Bilinear	14k	81.6	88.6	81.6
Kernel Pooling	14k	84.7	91.1	85.7
iSQRT-COV	8k	87.3	91.7	89.5
iSQRT-COV	32k	88.1	92.8	90.0
DBTNet-50 (ours)	2k	87.5	94.1	91.2
DBTNet-101 (ours)	2k	88.1	94.5	91.6

Prerequisites

MXNet 1.3.1

GluonCV 0.3.0

Quick Start

Prepare the data:

download the imagenet data:

cd data/imagenet/
wget https://australiav100data.blob.core.windows.net/heliang/imagenet_train.rec
wget https://australiav100data.blob.core.windows.net/heliang/imagenet_train.idx
wget https://australiav100data.blob.core.windows.net/heliang/imagenet_val.rec
wget https://australiav100data.blob.core.windows.net/heliang/imagenet_val.idx

download the CUB-200-2011 dataset:

cd data/
wget https://australiav100data.blob.core.windows.net/heliang/cub.tar
tar -xvf cub.tar

Train the model on ImageNet dataset:

cd code/
bash train_imagenet_dbt.sh

Fine-tune the model on CUB-200-2011 dataset:

The ImageNet pretrained model is available.

cd code/
bash ft_cub_dbt.sh

Pytorch Version

On going. Welcome to reimplement and share the DBT code in pytorch.

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@incollection{NIPS2019_8680,
title = {Learning Deep Bilinear Transformation for Fine-grained Image Representation},
author = {Zheng, Heliang and Fu, Jianlong and Zha, Zheng-Jun and Luo, Jiebo},
booktitle = {Advances in Neural Information Processing Systems 32},
pages = {4279--4288},
year = {2019}

TTSR

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution

SiamDW

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking

Stark

[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking

TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking

STTN

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting

Jupyter Notebook

AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

LightTrack

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

tasn

Trilinear Attention Sampling Network for Fine-grained Image Recognition

soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

TTVSR

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution

FTVSR

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

generate-it

A collection of models for image<->text generation in ACM MM 2021.

CKDN

[ICCV'21] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

SariGAN

[NeurIPS'20] Learning Semantic-aware Normalization for Generative Adversarial Networks

VOT2019

The Winner and Runner-up Trackers for VOT-2019 Challenges

WSOD2

[ICCV'19] WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

VQD-SR

[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution

CyDAS

Cyclic Differentiable Architecture Search

NEAS

2D-TAN

AAAI2020 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language

STTR

[ACCV'22] Fine-Grained Image Style Transfer with Visual Transformers

AAST-pytorch

[MM'20] Aesthetic-Aware Image Style Transfer

davinci-videofactory

AI_Illustrator

[MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

language-guided-animation

[TMM 2023] Language-Guided Face Animation by Recurrent StyleGAN-based Generator