Awesome Visual Question Answering:
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Contributing
Please feel free to send me pull requests or email ([email protected]) to add links. Markdown format:
- [Paper Name](link) - Author 1 et al, **Conference Year**. [[code]](link)
Change Log
- Mar.3rd,2019 The First version released.
Table of Contents
- Contributing
- Change Log
- Table of Contents
- Papers
- VQA Challenge Leaderboard
- Licenses
- Reference and Acknowledgement
Papers
Survey
- Visual question answering: Datasets, algorithms, and future challenges - Kushal Kafle et al, CVIU 2017.
- Visual question answering: A survey of methods and datasets - Qi Wu et al, CVIU 2017.
- Video Question Answering: Datasets, Algorithms and Challenges - yaoyao Zhong et al, EMNLP 2022.
2022
EMNLP 2022
- Video Question Answering: Datasets, Algorithms and Challenges - yaoyao Zhong et al, EMNLP 2022.
- Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering - Jialin Wu et al, EMNLP 2022.
- Retrieval Augmented Visual Question Answering with Outside Knowledge - Weizhe Lin et al, EMNLP 2022.
- CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering - Maitreya Patel et al, EMNLP 2022. [proj] [code]
- Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning - Qingyi Si et al, EMNLP 2022 (Findings). [code]
- Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training - Anthony Meng Huat Tiong et al, EMNLP 2022 (Findings).
- Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA - Qingyi Si et al, EMNLP 2022 (Findings). [code]
NeurIPS 2022
- REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering - Yuanze Lin et al, NeurIPS 2022.
- Towards Video Text Visual Question Answering: Benchmark and Baseline - Minyi Zhao et al, NeurIPS 2022.
ACL 2022
- CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment - Haoyu Song et al, ACL 2022.
- CARETS: A Consistency And Robustness Evaluative Test Suite for VQA - Carlos Jimenez et al, ACL 2022.
- Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering - Yu-Jung Heo et al, ACL 2022.
- DuReader_vis: A Chinese Dataset for Open-domain Document Visual Question Answering - Le Qi et al, ACL 2022 (Findings).
- xGQA: Cross-Lingual Visual Question Answering - Jonas Pfeiffer et al, ACL 2022 (Findings). [data]
- Co-VQA : Answering by Interactive Sub Question Sequence - Ruonan Wang et al, ACL 2022 (Findings).
CVPR 2022
- SimVQA: Exploring Simulated Environments for Visual Question Answering - Paola Cascante-Bonilla et al, CVPR 2022. [code]
- A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering - Feng Gao et al, CVPR 2022.
- SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering - Vipul Gupta et al, CVPR 2022. [code]
- Dual-Key Multimodal Backdoors for Visual Question Answering - Matthew Walmer et al, CVPR 2022. [code]
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering - Yang Ding et al, CVPR 2022. [code]
- Grounding Answers for Visual Questions Asked by Visually Impaired People - Choyan Chen et al, CVPR 2022. [page]
- Maintaining Reasoning Consistency in Compositional Visual Question Answering - Chenchen Jing et al, CVPR 2022. [code]
ICLR 2022
- RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning - Xiaojian Ma et al, ICLR 2022. [code]
AAAI 2022
- Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering - Mingxiao Li et al, AAAI 2022. [code]
IJCAI 2022
- Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering - Min Peng et al, IJCAI 2022. [code]
BMVC 2022
- TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation - Jun Wang et al, BMVC 2022. [code]
2021
NeurIPS 2021
- Human-Adversarial Visual Question Answering - Sasha Sheng et al, NeurIPS 2021. [code]
- Debiased Visual Question Answering from Feature and Sample Perspectives - Zhiquan Wen et al, NeurIPS 2021. [code]
- Learning to Generate Visual Questions with Noisy Supervision - Kai Shen et al, NeurIPS 2021. [code]
- Proto: Program-guided transformer for program-guided tasks - Zelin Zhao et al, NeurIPS 2021. [code]
EMNLP 2021
- Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering - Jihyung Kil et al, EMNLP 2021.
- Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking - Jihyung Kil et al, EMNLP 2021 (demo). [code]
- Diversity and Consistency: Exploring Visual Question-Answer Pair Generation - Sen Yang et al, EMNLP 2021 (Findings).
- Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation - Humair Raj Khan et al, EMNLP 2021 (Findings).
- MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering - Junjie Wang et al, EMNLP 2021 (Findings). [code]
ICCV 2021
- Just Ask: Learning To Answer Questions From Millions of Narrated Videos - Antoine Yang et al, ICCV 2021.
- Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments - Difei Gao et al, ICCV 2021.
- On The Hidden Treasure of Dialog in Video Question Answering - Deniz Engin et al, ICCV 2021.
- Unshuffling Data for Improved Generalization in Visual Question Answering - Damien Teney et al, ICCV 2021.
- TRAR: Routing the Attention Spans in Transformer for Visual Question Answering - Yiyi Zhou et al, ICCV 2021.
- Greedy Gradient Ensemble for Robust Visual Question Answering - Xinzhe Han et al, ICCV 2021.
- Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos - Heeseung Yun et al, ICCV 2021.
- Weakly Supervised Relative Spatial Reasoning for Visual Question Answering - Pratyay Banerjee et al, ICCV 2021.
- Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering - Qingxing Cao et al, ICCV 2021.
- Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering - Corentin Dancette et al, ICCV 2021.
- Auto-Parsing Network for Image Captioning and Visual Question Answering - Xu Yang et al, ICCV 2021.
- Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue - Shoya Matsumori et al, ICCV 2021.
ACL 2021
- Check It Again:Progressive Visual Question Answering via Visual Entailment - Qingyi Si et al, ACL 2021. [code]
- Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering - Siddharth Karamcheti et al, ACL 2021. [code]
- In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering - Peter Vickers et al, ACL 2021.
- Towards Visual Question Answering on Pathology Images - Xuehai He et al, ACL 2021. [code]
- Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions - Daniel Rosenberg et al, ACL 2021. [code]
SIGIR 2021
- LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering - Zujie Liang et al, SIGIR 2021. [code]
- Passage Retrieval for Outside-Knowledge Visual Question Answering - Chen Qu et al, SIGIR 2021. [code]
- Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering - Aman Jain et al, SIGIR 2021. [code]
- Visual Question Rewriting for Increasing Response Rate - Jiayi Wei et al, SIGIR 2021.
CVPR 2021
- Separating Skills and Concepts for Novel Visual Question Answering - Spencer Whitehead et al, CVPR 2021.
- Roses Are Red, Violets Are Blue... but Should VQA Expect Them To? - Corentin Kervadec et al, CVPR 2021 [code]
- Predicting Human Scanpaths in Visual Question Answering - Xianyu Chen et al, CVPR 2021
- Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules - Aisha Urooj et al, CVPR 2021
- TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption - Zhengyuan Yang et al, CVPR 2021
- Counterfactual VQA: A Cause-Effect Look at Language Bias - Yulei Niu et al, CVPR 2021 [code]
- KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA - Kenneth Marino et al, CVPR 2021
- Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing - Yuanyuan Yuan et al, CVPR 2021
- How Transferable Are Reasoning Patterns in VQA? - Corentin Kervadec et al, CVPR 2021
- Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels - Mingda Zhang et al, CVPR 2021
- Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation - Tao Tu et al, CVPR 2021
ICLR 2021
- MultiModalQA: complex question answering over text, tables and images - Alon Talmor et al, ICLR 2021. [page]
NAACL-HLT 2021
- CLEVR_HYP: A Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images - Shailaja Keyur Sampat et al, NAACL-HLT 2021. [code]
- Video Question Answering with Phrases via Semantic Roles - Arka Sadhu et al, NAACL-HLT 2021.
- SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency - Sameer Dharur et al, NAACL-HLT 2021.
- EaSe: A Diagnostic Tool for VQA based on Answer Diversity - Shailza Jolly et al, NAACL-HLT 2021.
- Ensemble of MRR and NDCG models for Visual Dialog - Idan Schwartz, NAACL-HLT 2021. [code]
AAAI 2021
- Regularizing Attention Networks for Anomaly Detection in Visual Question Answering - Doyup Lee et al, AAAI 2021.
- A Case Study of the Shortcut Effects in Visual Commonsense Reasoning - Keren Ye et al, AAAI 2021. [code]
- VisualMRC: Machine Reading Comprehension on Document Images - Ryota Tanaka et al, AAAI 2021. [page]
2020
EMNLP 2020
- MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering - Tejas Gokhale et al, EMNLP 2020. [code]
- Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering - Zujie Liang et al, EMNLP 2020. [code]
- VD-BERT: A Unified Vision and Dialog Transformer with BERT - Yue Wang et al, EMNLP 2020.
NeurIPS 2020
- Multimodal Graph Networks for Compositional Generalization in Visual Question Answering - Raeid Saqur et al, NeurIPS 2020.
- Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies - Itai Gat et al, NeurIPS 2020.
- Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data - Michael Cogswell et al, NeurIPS 2020.
- On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law - Damien Teney et al, NeurIPS 2020.
ECCV 2020
- Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder - Gouthaman KV et al, ECCV 2020.
- Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions - Noa Garcia et al, ECCV 2020.
- Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering - Ruixue Tang et al, ECCV 2020.
- Visual Question Answering on Image Sets - Ankan Bansal et al, ECCV 2020.
- VQA-LOL: Visual Question Answering under the Lens of Logic - Tejas Gokhale et al, ECCV 2020.
- TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering - Xiaofeng Yang et al, ECCV 2020.
- Spatially Aware Multimodal Transformers for TextVQA - Yash Kant et al, ECCV 2020.
CVPR 2020
- Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text - Difei Gao et al, CVPR 2020. [code]
- On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering - Xinyu Wang et al, CVPR 2020.
- In Defense of Grid Features for Visual Question Answering - Huaizu Jiang et al, CVPR 2020.
- Counterfactual Samples Synthesizing for Robust Visual Question Answering - Long Chen et al, CVPR 2020.
- Counterfactual Vision and Language Learning - Ehsan Abbasnejad et al, CVPR 2020.
- Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA - Ronghang Hu et al, CVPR 2020.
- Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing - Vedika Agarwal et al, CVPR 2020.
- SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions - Ramprasaath R. Selvaraju et al, CVPR 2020.
- TA-Student VQA: Multi-Agents Training by Self-Questioning - Peixi Xiong et al, CVPR 2020.
- VQA With No Questions-Answers Training - Ben-Zion Vatashsky et al, CVPR 2020.
- Hierarchical Conditional Relation Networks for Video Question Answering - Thao Minh Le et al, CVPR 2020.
- Modality Shifting Attention Network for Multi-Modal Video Question Answering - Junyeong Kim et al, CVPR 2020.
- Webly Supervised Knowledge Embedding Model for Visual Reasoning - Wenbo Zheng et al, CVPR 2020.
- Differentiable Adaptive Computation Time for Visual Reasoning - Cristobal Eyzaguirre et al, CVPR 2020.
ACL 2020
- A negative case analysis of visual grounding methods for VQA - Robik Shrestha et al, ACL 2020.
- Cross-Modality Relevance for Reasoning on Language and Vision - Chen Zheng et al, ACL 2020.
- Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA - Hyounghun Kim et al, ACL 2020.
- TVQA+: Spatio-Temporal Grounding for Video Question Answering - Jie Lei et al, ACL 2020.
WACV 2020
- BERT representations for Video Question Answering - Zekun Yang et al, WACV 2020.
- Deep Bayesian Network for Visual Question Generation - Badri Patro et al, WACV 2020.
- Robust Explanations for Visual Question Answering - Badri Patro et al, WACV 2020.
- Visual Question Answering on 360deg Images - Shih-Han Chou et al, WACV 2020.
- LEAF-QA: Locate, Encode & Attend for Figure Question Answering - Ritwick Chaudhry et al, WACV 2020.
- Answering Questions about Data Visualizations using Efficient Bimodal Fusion - Kushal Kafle et al, WACV 2020.
AAAI 2020
- Multi‐Question Learning for Visual Question Answering - Chenyi Lei et al, AAAI 2020.
- Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA - Badri N. Patro et al, AAAI 2020.
- Overcoming Language Priors in VQA via Decomposed Linguistic Representations - Chenchen Jing et al, AAAI 2020.
- Unified Vision-Language Pre-Training for Image Captioning and VQA - Luowei Zhou et al, AAAI 2020.
- Re‐Attention for Visual Question Answering - Wenya Guo et al, AAAI 2020.
- Divide and Conquer: Question‐Guided Spatio‐Temporal Contextual Attention for Video Question Answering - Jianwen Jiang et al, AAAI 2020.
- Reasoning with Heterogeneous Graph Alignment for Video Question Answering - Pin Jiang et al, AAAI 2020.
- Location‐aware Graph Convolutional Networks for Video Question Answering - Deng Huang et al, AAAI 2020.
- KnowIT VQA: Answering Knowledge‐Based Questions about Videos - Noa Garcia et al, AAAI 2020.
2019
ACL 2019
- Generating Question Relevant Captions to Aid Visual Question Answering - Jialin Wu et al, ACL 2019.
- Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering - Claudio Greco et al, ACL 2019. [code]
- Multi-grained Attention with Object-level Grounding for Visual Question Answering - Pingping Huang et al, ACL 2019.
- Improving Visual Question Answering by Referring to Generated Paragraph Captions - Hyounghun Kim et al, ACL 2019.
ICCV 2019
- Compact Trilinear Interaction for Visual Question Answering - Tuong Do Kim et al, ICCV 2019.
- Scene Text Visual Question Answering - Ali Furkan Biten et al, ICCV 2019.
- Multi-Modality Latent Interaction Network for Visual Question Answering - Peng Gao et al, ICCV 2019.
- Relation-Aware Graph Attention Network for Visual Question Answering - Linjie Li et al, ICCV 2019.
- Why Does a Visual Question Have Different Answers? - Nilavra Bhattacharya et al, ICCV 2019.
NeurIPS 2019
- RUBi: Reducing Unimodal Biases for Visual Question Answering - Remi Cadene et al, NeurIPS 2019.
- Self-Critical Reasoning for Robust Visual Question Answering - Jialin Wu et al, NeurIPS 2019.
CVPR 2019
- Deep Modular Co-Attention Networks for Visual Question Answering - Zhou Yu et al, CVPR 2019. [code]
- Information Maximizing Visual Question Generation - Ranjay Krishna et al, CVPR 2019. [code]
- Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence - Amir Zadeh et al, CVPR 2019. [code]
- Learning to Compose Dynamic Tree Structures for Visual Contexts - Kaihua Tang et al, CVPR 2019. [code]
- Transfer Learning via Unsupervised Task Discovery for Visual Question Answering - Hyeonwoo Noh et al, CVPR 2019. [code]
- Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph - Yao-Hung Hubert Tsai et al, CVPR 2019. [code]
- Explainable and Explicit Visual Reasoning over Scene Graphs - Jiaxin Shi et al, CVPR 2019. [code]
- MUREL: Multimodal Relational Reasoning for Visual Question Answering - Remi Cadene et al, CVPR 2019. [code]
- Image-Question-Answer Synergistic Network for Visual Dialog - Dalu Guo et al, CVPR 2019. [code]
- RAVEN: A Dataset for Relational and Analogical Visual rEasoNing - Chi Zhang et al, CVPR 2019. [project page]
- Cycle-Consistency for Robust Visual Question Answering - Meet Shah et al, CVPR 2019.
- It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning - Monica Haurilet et al, CVPR 2019.
- OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge - Kenneth Marino et al, CVPR 2019.
- Visual Question Answering as Reading Comprehension - Hui Li et al, CVPR 2019.
- Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering - Peng Gao et al, CVPR 2019.
- Explicit Bias Discovery in Visual Question Answering Models - Varun Manjunatha et al, CVPR 2019.
- Answer Them All! Toward Universal Visual Question Answering Models - Robik Shrestha et al, CVPR 2019.
- Visual Query Answering by Entity-Attribute Graph Matching and Reasoning - Peixi Xiong et al, CVPR 2019.
AAAI 2019
- Differential Networks for Visual Question Answering - Chenfei Wu et al, AAAI 2019. [code]
- BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection - Hedi Ben-younes et al, AAAI 2019. [code]
- Dynamic Capsule Attention for Visual Question Answering - Yiyi Zhou et al, AAAI 2019. [code]
- Structured Two-stream Attention Network for Video Question Answering - Lianli Gao et al, AAAI 2019. [code]
- Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering - Xiangpeng Li et al, AAAI 2019. [code]
- WK-VQA: World Knowledge-enabled Visual Question Answering - Sanket Shah et al, AAAI 2019. [code]
- Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning - Yiyi Zhou et al, AAAI 2019. [code]
OTHER
- Focal Visual-Text Attention for Memex Question Answering - Junwei Liang et al, TPAMI 2019. [code]
- Plenty is Plague: Fine-Grained Learning for Visual Question Answering - Yiyi Zhou et al, TPAMI 2019.
- Combining Multiple Cues for Visual Madlibs Question Answering - Tatiana Tommasi et al, IJCV 2019. [code]
- Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation - Sang-Woo Lee et al, ICLR 2019. [code]
2018
NIPS 2018
- Bilinear Attention Networks - Jin-Hwa Kim et al, NIPS 2018. [code]
- Chain of Reasoning for Visual Question Answering - Chenfei Wu et al, NIPS 2018. [code]
- Learning Conditioned Graph Structures for Interpretable Visual Question Answering - Will Norcliffe-Brown et al, NIPS 2018. [code]
- Learning to Specialize with Knowledge Distillation for Visual Question Answering - Jonghwan Mun et al, NIPS 2018. [code]
- Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering - Medhini Narasimhan et al, NIPS 2018. [code]
- Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - Sainandan Ramakrishnan et al, NIPS 2018. [code]
AAAI 2018
- Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering - Somak Aditya et al, AAAI 2018. [code]
- Co-Attending Free-Form Regions and Detections with Multi-Modal Multiplicative Feature Embedding for Visual Question Answering - Pan Lu et al, AAAI 2018. [code]
- Exploring Human-Like Attention Supervision in Visual Question Answering - Somak Aditya et al, AAAI 2018. [code]
- Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents - Bo Wang et al, AAAI 2018. [code]
IJCAI 2018
- Feature Enhancement in Attention for Visual Question Answering - Yuetan Lin et al, IJCAI 2018. [code]
- A Question Type Driven Framework to Diversify Visual Question Generation - Zhihao Fan et al, IJCAI 2018. [code]
- Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network - Zhou Zhao et al, IJCAI 2018. [code]
- Open-Ended Long-form Video Question Answering via Adaptive Hierarchical Reinforced Networks - Zhou Zhao et al, IJCAI 2018. [code]
CVPR 2018
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Peter Anderson et al, CVPR 2018. [code(author)] [code(pythiaV0.1)] [code(Pytorch Reimplementation)]
- Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge - Damien Teney et al, CVPR 2018. [code]
- Learning by Asking Questions - Ishan Misra et al, CVPR 2018. [code]
- Embodied Question Answering - Abhishek Das et al, CVPR 2018. [code]
- VizWiz Grand Challenge: Answering Visual Questions From Blind People - Danna Gurari et al, CVPR 2018. [code]
- Textbook Question Answering Under Instructor Guidance With Memory Networks - Juzheng Li et al, CVPR 2018. [code]
- IQA: Visual Question Answering in Interactive Environments - Daniel Gordon et al, CVPR 2018. [sample video]
- Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - Aishwarya Agrawal et al, CVPR 2018. [code]
- Learning Answer Embeddings for Visual Question Answering - Hexiang Hu et al, CVPR 2018. [code]
- DVQA: Understanding Data Visualizations via Question Answering - Kushal Kafle et al, CVPR 2018. [code]
- Cross-Dataset Adaptation for Visual Question Answering - Wei-Lun Chao et al, CVPR 2018. [code]
- Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering - Unnat Jain et al, CVPR 2018. [code]
- Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering - Duy-Kien Nguyen et al, CVPR 2018. [code]
- Visual Question Generation as Dual Task of Visual Question Answering - Yikang Li et al, CVPR 2018. [code]
- Focal Visual-Text Attention for Visual Question Answering - Junwei Liang et al, CVPR 2018. [code]
- Motion-Appearance Co-Memory Networks for Video Question Answering - Jiyang Gao et al, CVPR 2018. [code]
- Visual Question Answering With Memory-Augmented Networks - Chao Ma et al, CVPR 2018. [code]
- Visual Question Reasoning on General Dependency Tree - Qingxing Cao et al, CVPR 2018. [code]
- Differential Attention for Visual Question Answering - Badri Patro et al, CVPR 2018. [code]
- Learning Visual Knowledge Memory Networks for Visual Question Answering - Zhou Su et al, CVPR 2018. [code]
- IVQA: Inverse Visual Question Answering - Feng Liu et al, CVPR 2018. [code]
- Customized Image Narrative Generation via Interactive Visual Question Generation and Answering - Andrew Shin et al, CVPR 2018. [code]
ACM MM 2018
- Object-Difference Attention: A simple relational attention for Visual Question Answering - Chenfei Wu et al, ACM MM 2018. [code]
- Enhancing Visual Question Answering Using Dropout - Zhiwei Fang et al, ACM MM 2018. [code]
- Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering - Xuanyi Dong et al, ACM MM 2018. [code]
- Explore Multi-Step Reasoning in Video Question Answering - Xiaomeng Song et al, ACM MM 2018. [code] [SVQA dataset]
ECCV 2018
- Visual Question Answering as a Meta Learning Task - Damien Teney et al, ECCV 2018. [code]
- Question-Guided Hybrid Convolution for Visual Question Answering - Peng Gao et al, ECCV 2018. [code]
- Goal-Oriented Visual Question Generation via Intermediate Rewards - Junjie Zhang et al, ECCV 2018. [code]
- Multimodal Dual Attention Memory for Video Story Question Answering - Kyung-Min Kim et al, ECCV 2018. [code]
- A Joint Sequence Fusion Model for Video Question Answering and Retrieval - Youngjae Yu et al, ECCV 2018. [code]
- Deep Attention Neural Tensor Network for Visual Question Answering - Yalong Bai et al, ECCV 2018. [code]
- Question Type Guided Attention in Visual Question Answering - Yang Shi et al, ECCV 2018. [code]
- Learning Visual Question Answering by Bootstrapping Hard Attention - Mateusz Malinowski et al, ECCV 2018. [code]
- Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering - Medhini Narasimhan et al, ECCV 2018. [code]
- Visual Question Generation for Class Acquisition of Unknown Objects - Kohei Uehara et al, ECCV 2018. [code]
OTHER
- Image Captioning and Visual Question Answering Based on Attributes and External Knowledge - Qi Wu et al, TPAMI 2018. [code]
- FVQA: Fact-Based Visual Question Answering - Peng Wang et al, TPAMI 2018. [code]
- R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering - Pan Lu et al, SIGKDD 2018. [code(Dataset)]
- Interpretable Counting for Visual Question Answering - Alexander Trott et al, ICLR 2018. [code]
- Learning to Count Objects in Natural Images for Visual Question Answering - Yan Zhang et al, ICLR 2018. [code]
- A Better Way to Attend: Attention With Trees for Video Question Answering - Hongyang Xue et al, TIP 2018. [code]
- Zero-Shot Transfer VQA Dataset - Pan Lu et al, arxiv preprint. [code]
- Visual Question Answering using Explicit Visual Attention - Vasileios Lioutas et al, *ISCAS 2018*. [code]
- Explicit ensemble attention learning for improving visual question answering - Vasileios Lioutas et al, *Pattern Recognition Letters 2018*. [code]
2017-2015
OTHER
Please check the other papers list from VQA area between 2017-2015 in awesome-vqa from JamesChuanggg, it seems that he hasn't maintained that project for a long time. Really appreciate for his work. I will merge his work to this list in the future.Stay tuned...
ICCV 2017
- Learning to Reason: End-to-End Module Networks for Visual Question Answering - Ronghang Hu et al, ICCV 2017. [code]
- Structured Attentions for Visual Question Answering - Chen Zhu et al, ICCV 2017. [code]
- VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation - Chuang Gan et al, ICCV 2017. [code]
- Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering - Zhou Yu et al, ICCV 2017. [code]
- An Analysis of Visual Question Answering Algorithms - Kushal Kafle et al, ICCV 2017. [code]
- MUTAN: Multimodal Tucker Fusion for Visual Question Answering - Hedi Ben-younes et al, ICCV 2017. [code]
- MarioQA: Answering Questions by Watching Gameplay Videos - Jonghwan Mun et al, ICCV 2017. [code]
- Learning to Disambiguate by Asking Discriminative Questions - Yining Li et al, ICCV 2017. [code]
VQA Challenge Leaderboard
I will collect the leaderboard's implementations in the future.Stay tuned...
test-std 2018
test-std 2017
TextVQA
VQA-CP
Licenses
To the extent possible under law, Jokie Leung has waived all copyright and related or neighboring rights to this work.
Reference and Acknowledgement
Really appreciate for their contributions in this area.