On generating characteristic-rich question sets for QA evaluation

Yu Su; Huan Sun; Brian Sadler; Mudhakar Srivatsa; Izzeddin Gur; Zenghui Yan; Xifeng Yan

doi:10.18653/v1/d16-1054

Publication

EMNLP 2016

Conference paper

On generating characteristic-rich question sets for QA evaluation

EMNLP 2016

View publication

Abstract

We present a semi-automated framework for constructing factoid question answering (QA) datasets, where an array of question characteristics are formalized, including structure complexity, function, commonness, answer cardinality, and paraphrasing. Instead of collecting questions and manually characterizing them, we employ a reverse procedure, first generating a kind of graph-structured logical forms from a knowledge base, and then converting them into questions. Our work is the first to generate questions with explicitly specified characteristics for QA evaluation. We construct a new QA dataset with over 5,000 logical form-question pairs, associated with answers from the knowledge base, and show that datasets constructed in this way enable fine-grained analyses of QA systems. The dataset can be found in https://github.com/ysu1989/GraphQuestions.

Date

01 Nov 2016

Publication

EMNLP 2016

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share