Publication
SDM 2007
Conference paper

On Privacy-Preservation of text and sparse binary data with sketches

View publication

Abstract

In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high dimensional, which are challenging to different privacy preservation algorithms. However, some domains of such data sets also have some special properties which make the use of sketch based techniques particularly useful. In this paper, we present a new method for privacy preserving data mining of text and binary data with the use of a sketch based approach. The special properties of such data sets which are exploited are that of sparsity; according to this property, only a small percentage of the attributes have non-zero values. We formalize an anonymity model for the sketch based approach, and utilize it in order to construct sketch based privacy preserving representations of the original data. This representation allows accurate computation of a number of important data mining primitives such as the dot product. Therefore, it can be used for a variety of data mining algorithms such as clustering and classification. We illustrate the effectiveness of our approach on a number of real and synthetic data sets. We show that the accuracy of data mining algorithms is preserved by the transformation even in the presence of increasing data dimensionality.

Date

Publication

SDM 2007

Authors

Share