Maintaining good performance in Disk arrays during failure via uniform parity group distribution
Abstract
Disk arrays are increasingly being used, including in distributed computing systems, as the vehicle for providing reliable and high performance data storage. When a disk in a RAIDS fails, data in that failed disk can still be made available through parity reconstruction by reading from the other disks. However, this poses an increased burden on the surviving disks, and if consideration is not given to this failure consequence, then the performance of the system may degrade to an unacceptable level. This paper describes techniques that will enable the disk array to maintain good performance in the event of a disk failure. After a failed disk has been repaired, its content must be reconstructed from all the associated parity groups. In RAIDS, this must be a single thread sequential process. With the techniques introduced in this paper, it is shown how this sequential process can now be broken down into multiple parallel processes distributed throughout the array, thus shortening the reconstruction time. While the techniques introduced in this paper are applied to disk arrays, they may potentially have general applications in other areas of distributed computing.