Publication
CIKM 2007
Conference paper

An experimental study of the impact of information extraction accuracy on semantic search performance

View publication

Abstract

Researchers have shown that various natural language processing techniques can be used in document analysis to impact search performance. For the most part, they examined how an analysis system with certain performance characteristics can be leveraged to improve document and/or passage search results. We have previously shown that semantic queries which utilize named entity and relation information extracted from the corpus can lead to significant improvement in search performance. In this paper, we extend our previous efforts and examine how search performance degrades in the face of imperfect named entity and relation extraction. Our study was carried out by developing gold standard annotated corpora and applying different error models to the gold standard annotations to simulate errors made by automatic recognizers. We identify automatic recognizer characteristics that make them more amenable to our search tasks, show that recognizer recall has more significant impact on semantic search performance than its precision, and demonstrate that significant improvement in both MAP and Exact Precision scores can be achieved by adopting automatic named entity and relation recognizers with near state-of-the-art performance. Copyright 2007 ACM.

Date

Publication

CIKM 2007

Authors

Share