Demystifying data deduplication

Nagapramod Mandagere; Pin Zhou; Mark A. Smith; Sandeep Uttamchandani

doi:10.1145/1462735.1462739

Publication

Middleware 2008

Conference paper

Demystifying data deduplication

Middleware 2008

View publication

Abstract

Effectiveness and tradeoffs of deduplication technologies are not well understood - vendors tout Deduplication as a "silver bullet"that can help any enterprise optimize its deployed storage capacity. This paper aims to provide a comprehensive taxonomy and experimental evaluation using real-world data. While the rate of change of data on a day-to-day basis has the greatest influence on the duplication in backup data, we investigate the duplication inherent in this data, independent of rate of change of data or backup schedule or backup algorithm used. Our experimental results show that between different deduplication techniques the space savings varies by about 30%, the CPU usage differs by almost 6 times and the time to reconstruct a deduplicated file can vary by more than 15 times.

Date

01 Dec 2008

Publication

Middleware 2008

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share