• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created about 8 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Visual Question Answering task written in Keras that answers questions about images

Visual Question Answering with Keras

Recent developments in Deep Learning has paved the way to accomplish tasks involving multimodal learning. Visual Question Answering (VQA) is one such challenge which requires high-level scene interpretation from images combined with language modelling of relevant Q&A. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. This is a Keras implementation of one such end-to-end system to accomplish the task.

Checkout the demo here: Demo

Architecture

The learning architecture behind this demo is based on the model proposed in the VQA paper.

Architecure

The problem is considered as a classification task here, wherein, 1000 top answers are chosen as classes. Images are transformed by passing it through the VGG-19 model that generates a 4096 dimensional vector in the second last layer. The tokens in the question are first embedded into 300 dimensional GloVe vectors and then passed through 2 layer LSTMs. Both multimodal data points are then passed through a dense layer of 1024 units and combined using point-wise multiplication. The new vector serves as input for a fully-connected model having a tanh and a final softmax layer.

Data

Preprocessed features provided by VT Vision Lab was used which consisted of images transformed through VGG19 model and indexed tokens.

Installation

The following packages need to be installed before running the scripts:

Then go to the data folder and download the requirements given over there.

Training

Run python train.py along with the following optional parameters: --epoch, --batch_size, --data_limit.

To evaluate the model on validation set, run python train.py --type val.

Training Details

Preprocessed features have been used based on these scripts written by the VT vision lab team. These features already consist of transformed image vectors, indexed tokens for text and other metadata, for both the training and validation set.

Training was done on g2.2xlarge spot instance of AWS. Mutltiple commuity AMIs can be found having all the required packages pre-installed. g2.2xlarge has a NVIDIA Grid K520 with 4GB memory and takes ~277 seconds/epoch for a batch size of 256. The model has been trained on 50 epochs and has a accuracy of 45.03% on the validation set. Also, the accuracy started decreasing after 70 epochs. Thus, there is a lot of scope for hyper-parameter tuning here.

Running the application

For details on how to run the demo app, check the docs in app/ folder.

Feedback

If you have any feedback or suggestions, do ping me at [email protected]

More Repositories

1

Conditional-PixelCNN-decoder

Tensorflow implementation of Gated Conditional Pixel Convolutional Neural Network
Python
486
star
2

Language-Modeling-GatedCNN

Tensorflow implementation of "Language Modeling with Gated Convolutional Networks"
Python
270
star
3

Ostrich

E-commerce Rental Platform written in Flask & React
Python
52
star
4

Medical-Diagnosis-Learning

Learning from Discharge Summaries to extract mentioned diagnoses using Hierarchical Attention Model
Jupyter Notebook
34
star
5

BEGAN-pytorch

Implementation of BEGAN in Pytorch and other interpolation experiments
Python
22
star
6

CUDA-Genetic-Algorithm-Travelling-Salesman-Problem

Implementation of Parallel Genetic Algorithm in CUDA to solve TSP (Berlin52)
Cuda
5
star
7

Naive-Bayes-Classifier

PHP
5
star
8

deep-dreamer-web

Web client for Google's deepdream
JavaScript
3
star
9

Recurrent-Entity-Networks-pytorch

Pytorch implementation of "Tracking the World State with Recurrent Entity Networks"
Python
2
star
10

iGoogle-interface

Basic iGoogle layout (widgets)
JavaScript
2
star
11

PyBuddy

This is a simple AI Android App made in Python
Python
2
star
12

Lister

Create and share lists
ApacheConf
1
star
13

Violence-detection-from-audio

Detection of violence from media by analyzing audio streams
MATLAB
1
star
14

Database-Anonymization

Database anonymization techniques : k-anonymity and l-diversity
PHP
1
star
15

Kaggle-Leaf-Classification

My solution to Kaggle's Leaf Classification competition
Jupyter Notebook
1
star
16

Comment-Spam-Classification-in-Reviews

This was a weekend project aiming to remove spams from comments in zomato food reviews. Zomato's comment data was used to train and test the classifier.
Python
1
star