• Stars
    star
    141
  • Rank 258,520 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created about 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Compact Bilinear Pooling in TensorFlow

Compact Bilinear Pooling

This repository contains the tensorflow implementation of Compact Bilinear Pooling.

Usage

Details of this operation can be seen in compact_bilinear_pooling_layer in compact_bilinear_pooling.py.

def compact_bilinear_pooling_layer(bottom1, bottom2, output_dim, sum_pool=True,
    rand_h_1=None, rand_s_1=None, rand_h_2=None, rand_s_2=None,
    seed_h_1=1, seed_s_1=3, seed_h_2=5, seed_s_2=7, sequential=True,
    compute_size=128)
    """
    Compute compact bilinear pooling over two bottom inputs. Reference:
    Yang Gao, et al. "Compact Bilinear Pooling." in Proceedings of IEEE
    Conference on Computer Vision and Pattern Recognition (2016).
    Akira Fukui, et al. "Multimodal Compact Bilinear Pooling for Visual Question
    Answering and Visual Grounding." arXiv preprint arXiv:1606.01847 (2016).
    Args:
        bottom1: 1st input, 4D Tensor of shape [batch_size, height, width, input_dim1].
        bottom2: 2nd input, 4D Tensor of shape [batch_size, height, width, input_dim2].
        output_dim: output dimension for compact bilinear pooling.
        sum_pool: (Optional) If True, sum the output along height and width
                  dimensions and return output shape [batch_size, output_dim].
                  Otherwise return [batch_size, height, width, output_dim].
                  Default: True.
        rand_h_1: (Optional) an 1D numpy array containing indices in interval
                  `[0, output_dim)`. Automatically generated from `seed_h_1`
                  if is None.
        rand_s_1: (Optional) an 1D numpy array of 1 and -1, having the same shape
                  as `rand_h_1`. Automatically generated from `seed_s_1` if is
                  None.
        rand_h_2: (Optional) an 1D numpy array containing indices in interval
                  `[0, output_dim)`. Automatically generated from `seed_h_2`
                  if is None.
        rand_s_2: (Optional) an 1D numpy array of 1 and -1, having the same shape
                  as `rand_h_2`. Automatically generated from `seed_s_2` if is
                  None.
        sequential: (Optional) if True, use the sequential FFT and IFFT
                    instead of tf.batch_fft or tf.batch_ifft to avoid
                    out-of-memory (OOM) error.
                    Note: sequential FFT and IFFT are only available on GPU
                    Default: True.
        compute_size: (Optional) The maximum size of sub-batch to be forwarded
                      through FFT or IFFT in one time. Large compute_size may
                      be faster but can cause OOM and FFT failure. This
                      parameter is only effective when sequential == True.
                      Default: 128.
    Returns:
        Compact bilinear pooled results of shape [batch_size, output_dim] or
        [batch_size, height, width, output_dim], depending on `sum_pool`.
    """

Testing

To test whether it works correctly on your system, run:

python compact_bilinear_pooling_test.py

The tests pass if no error occurs running the above command.

Note that sequential=True (Default) only supports GPU computation, with no CPU kernel available.

Building

The sequential_fft/build/sequential_batch_fft.so is built against TensorFlow version 1.12.0 with CUDA 8.0 and g++ 5.4.0, which should be compatible with the official build of TensorFlow 1.12.0 on Ubuntu/Linux 64-bit.

If you set sequential=True (Default), you will need this sequential_batch_fft.so to be compatible with your TensorFlow installation.

If installed TensorFlow from source, or want to use a different version of TensorFlow other than 1.12.0 that may be built with a different compiler and a different CUDA version, you may need to rebuild sequential_batch_fft.so with compile.sh in sequential_fft/, using the same CUDA version and a compatible C++ compiler. To see the compiler version of an official TF build, run in Python the follows.

import tensorflow as tf; print(tf.__compiler_version__)

Reference

Yang Gao, et al. "Compact Bilinear Pooling." in Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition (2016).
Akira Fukui, et al. "Multimodal Compact Bilinear Pooling for Visual Question
Answering and Visual Grounding." arXiv preprint arXiv:1606.01847 (2016).

More Repositories

1

seg_every_thing

Code release for Hu et al., Learning to Segment Every Thing. in CVPR, 2018.
Python
423
star
2

n2nmn

Code release for Hu et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering. in ICCV, 2017
SourcePawn
270
star
3

speaker_follower

Code release for Fried et al., Speaker-Follower Models for Vision-and-Language Navigation. in NeurIPS, 2018.
C++
124
star
4

natural-language-object-retrieval

Code release for Hu et al. Natural Language Object Retrieval, in CVPR, 2016
Jupyter Notebook
112
star
5

lcgn

Code release for Hu et al., Language-Conditioned Graph Networks for Relational Reasoning. in ICCV, 2019
Python
90
star
6

text_objseg

Code release for Hu et al. Segmentation from Natural Language Expressions. in ECCV, 2016
Jupyter Notebook
86
star
7

snmn

Code release for Hu et al., Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018
Python
71
star
8

cmn

Code release for Hu et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks. in CVPR, 2017
Python
67
star
9

gqa_single_hop_baseline

A simple but well-performing "single-hop" visual attention model for the GQA dataset
Python
19
star
10

vit_10b_fsdp_example

See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md
Python
18
star
11

moco_v3_tpu

Python
16
star
12

vqa-maskrcnn-benchmark-m4c

Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_feature.py
Python
12
star
13

visualnet_label

An Online Tool for Rigid Object Landmark Labeling
JavaScript
4
star
14

SanguoshaEX

Sanguosha EX: An Open Source PC Game Based on Popular Desktop Game "Sanguosha"
C++
3
star
15

ptxla_scaling_examples

A list of examples for model scaling in PyTorch/XLA
2
star
16

mhex_graph

Modified Hierarchy-Exclusion Graph (MHEX Graph)
MATLAB
1
star
17

detectron2_vitdet

Python
1
star