Publication
VLDB 2000
Conference paper

What is the nearest neighbor in high dimensional spaces?

Abstract

Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show. however, the problem is a very difficult one, not only with regards to the performance issue hut also to the quality issue. In this paper. we discuss the quality issue and identif a new generalized notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast to previous approaches, Our new notion of nearest neighbor search does not treat all dimensions equally l)ut uses a quality criterion to select relevant dimensions (projections) with respect to the given query. As an example for a useful quality criterion, we rate how well the data is clustered around the query point within the selected projection. We then propose an efficient and effective algorithm to solve the generalized nearest neighbor problem. Our experiments based on a number of real and synthetic data sets show that our new approach provides new insights into the nature of nearest neighbor search on high dimensional data.

Date

Publication

VLDB 2000

Authors

Share