Exploiting clustering and phrases for context-based information retrieval
Abstract
This paper explores exploiting the synergy between document clustering and phrasal analysis for the purpose of automatically constructing a context-based retrieval system. A context consists of two components - a cluster of logically related articles (its extension) and a small set of salient concepts, represented by words and phrases and organized by the cluster's key terms (its intension). At run-time, the system presents contexts that best match the result list of a user's natural language query. The user can then choose a context and manipulate the intensional component to both browse the context's extension and launch new searches over the entire database. We argue that the focused relevance feedback provided by contexts, at a level of abstraction higher than individual documents and lower than the database as a whole, provides a natural way for users to refine vague information needs and helps to blur the distinction between searching and browsing. The Paraphrase interface, running over a database of business-related news articles, is used to illustrate the advantages of such a context-based retrieval paradigm. Copyright 1997 ACM.