Effect of data placement on the reliability of data storage systems
Abstract
Data redundancy, in the form of replication or advanced erasure codes, is used to protect data from storage node failures. It is known that that the placement of this redundant data across storage nodes can have a significant impact on the reliability, especially for large-scale storage systems. In particular, a declustered placement of redundant data is shown to have significantly higher reliability than the traditionally-used clustered placement for many redundancy schemes. This implies that significant gains in reliability can be obtained without losing storage efficiency by choosing the declustered placement scheme. Approximate expressions for the mean time to data loss of the system in terms of the various parameters of the system are obtained by considering the shortest paths to data loss when node failures occur and rebuild processes commence. These expressions are shown to hold true for parameters of practical interest through detailed event driven simulations. © 2013 IEEE.