Publication
IEEE TKDE
Paper

On the analytical properties of high-dimensional randomization

View publication

Abstract

In this paper, we will provide the first comprehensive analysis of high-dimensional randomization. The goal is to examine the strengths and weaknesses of randomization and explore both the potential and the pitfalls of high-dimensional randomization. Our theoretical analysis results in a number of interesting and insightful conclusions. 1) The privacy effects of randomization reduce rapidly with increasing dimensionality. 2) The properties of the underlying data set can affect the anonymity level of the randomization method. For example, natural properties of real data sets such as clustering improve the effectiveness of randomization. On the other hand, variations in data density of nonempty data localities and outliers create privacy preservation challenges for the randomization method. 3) The use of a public information-sensitive attack method makes the choice of perturbing distribution more critical than previously thought. In particular, Gaussian perturbations are significantly more effective than uniformly distributed perturbations for the high dimensional case. These insights are very useful for future research and design of the randomization method. We use the insights gained from our analysis to discuss and suggest future research directions for improvements and extensions of the randomization method. © 1989-2012 IEEE.

Date

Publication

IEEE TKDE

Authors

Topics

Share