On point sampling versus space sampling for dimensionality reduction
Abstract
n recent years, random projection has been used as a valuable tool for performing dimensionality reduction of high dimensional data. Starting with the seminal work of Johnson and Lindenstrauss [8], a number of interesting implementations of the random projection techniques have been proposed for dimensionality reduction. These techniques are mostly space symmetric random projections in which random hyperplanes are sampled in order to construct the projection. While these methods can provide effective reductions with worst-case bounds, they are not sensitive to the fact that the underlying data may have much lower implicit dimensionality than the full dimensionality. This may often be the case in many real applications. In this work, we analyze the theoretical effectiveness of point sampled random projections, in which the sampled hyperplanes are denned in terms of points sampled from the data. We show that point sampled random projections can be significantly more effective in most data sets, since the implicit dimensionality is usually significantly lower than the full dimensionality. In pathological cases, where space sampled random projections are better, it is possible to use a mixture of the two methods to design a random projection method with excellent average case behavior, while retaining the worst case behavior of space sampled random projections.