There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Repository Details
A model for Visual Question Answering. Extracts image features using the vgg16 architecture. Extracts language features using LSTM. Outputs from both are combined using a fully connected MLP, outputs of which belong to pre-chosen set of most probable answers.