Answer-Type Prediction for Visual Question Answering

Citation data:

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Page: 4976-4984

Publication Year:
2016
Usage 92
Downloads 81
Abstract Views 11
Captures 141
Readers 141
Citations 4
Citation Indexes 4
Repository URL:
http://scholarworks.rit.edu/other/887
DOI:
10.1109/cvpr.2016.538
Author(s):
Kafle, Kushal; Kanan, Christopher
Publisher(s):
Institute of Electrical and Electronics Engineers (IEEE)
Tags:
machine learning; computer vision; natural language processing
conference paper description
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach’s key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.