Privacy preservation for associative classification
Abstract
Privacy preservation is becoming a critical issue to data-mining processes. In practice, a data transformation process is often needed to preserve privacy. However, data transformation would introduce a data quality issue. In this case, the impact on data quality due to the data transformation should be estimated and made clear to the user of the data transformation process. In this article, we consider the problem of k-anonymization transformation in associative classification. The privacy preservation and data quality issues are considered in twofold. First, we propose a frequency-based data quality metric to represent the data quality for associative classification. Second, a novel heuristic algorithm, namely minimum classification correction rate transformation, is proposed. The algorithm is guided by the classification correction rate of the given datasets. We validate our proposed metric and algorithm with University of California-Irvine repository datasets. The experiment results have shown that our proposed metric can effectively demonstrate the data quality for associative classification. The results also show that the proposed algorithm is not only efficient but also highly effective.