A Method to Implement Clustered RAID

 

Siu-Cheung Chau
Dept. of Mathematics and Computer Science, University of Lethbridge,
Lethbridge, Alberta, Canada, T1K 3M4
e-mail: chau@cs.uleth.ca

Ada Wai-Chee Fu
Dept. of Computer Science and Engineering,
Chinese University of Hong Kong, Shatin, Hong Kong

Abstract

A Redundant Array of Independent Disks (RAID) can tolerate the failure of a single disk by adding one disk to store the parity of the other disks. If a disk fails, the surviving disks and the parity disk can be used to reconstruct the data in the failed disk. However, the failure of a disk will transfer 100% of the load to each of the other disks. To minimize
the load increase, a clustered RAID is proposed by Muntz and Lui [1] to solve the problem of double in load when a disk failure occurs. Instead of having only one parity group, data is divided into p parity groups. If the parity groups are distributed evenly among all disks, the failure of one disk will transfer a smaller percentage of the load to each of the surviving disks. This is an improvement over the 100% transfer in load for ordinary RAID. Hence, if an efficient method to evenly distribute the parity groups can be found, the clustered RAID method will be much better than the RAID method. Let d be the number of disks available, p be the number of parity groups, and g be the size of each
parity group. In this paper, we propose an efficient method to evenly distribute p parity groups where p is a prime number. Our method is a table look-up implementation where the look-up table is of size d(p+1)plogpd-1. After the table is set up, only a few calculations are required to determine the data blocks that belong to the same parity group. We show that our method is optimal for d >= p2 in terms of even distribution and work load increase.

Basic Approach

We propose the use of a small look-up table to store the distribution of the parity groups. We shall call it a parity groups' distribution look-up table. We shall show that for a write operation, we only have to perform a few calculations to determine which row of the table we should use. Since the write operation takes much longer time compared to a few processor operations, the increase in access time would be negligible. Furthermore, the small look-up table can be stored in the disk controller level.

The look-up table is constructed by using circulant matrices. A circulant matrix is a matrix of size n by n. Each column consists of the integers from 1 to n. Once the value of the first column is chosen, the values of the elements for the next column is obtained by performing a rotation with respect to the previous column. The rotation is done by shifting the value of the element at row i+1 to row i for i in {2..n} and shifting the value of the element at the first row to the last row. The first column of a circulant matrix is either the column {1, 2, 3, ...n}-1 or the result of a number of rotations with respect to this column. For example, the following are 3 by 3 circulant matrices.


1 2 3 2 3 1 3 1 2
2 3 1 = M3,1 3 1 2 = M3,2 1 2 3=M3,3
3 1 2 1 2 3 2 3 1

 

If the values of the first column of a circulant matrix are all distinct and consist of values from 1 to n, the values stored in each row are all distinct, too. Furthermore, the resulting matrix will have the property that each value appears only once in each row or column.

Lemma 1: A circulant matrix with distinct value in its first column will have distinct values in each row and each column.

With the three circulant matrices M3,1, M3,2, M3,3, and three other matrices I3,1, I3,2, and I3,3, we can construct the parity groups' distribution look-up table for 9 disks with 3 parity groups. The matrices I3,1, I3,2, and I3,3, are listed below.


1 1 1 2 2 2 3 3 3
2 2 2 = I3,1 3 3 3 = I3,2 1 1 1 = I3,3
3 3 3 1 1 1 2 2 2

 

The look-up table is constructed by combining these matrices together. In this table, an entry of i (1 <= i <= p) represents a block of data belonging to parity group i. The look-up table for p=3 and d=9 is listed below. The entries of i's in a row are together called a parity group unit. A parity group unit represents a set of data blocks in different disks plus the parity block in yet a different disk which carries the parity of the data in the data locks.


M3,1 M3,1 M3,1
M3,1 M3,2 M3,3
M3,1 M3,3 M3,2
I3,1 I3,2 I3,3


Disks
1 2 3 4 5 6 7 8 9

block (row) -------------------------------------

0 1 2 3 1 2 3 1 2 3
1 2 3 1 2 3 1 2 3 1
2 3 1 2 3 1 2 3 1 2

3 1 2 3 2 3 1 3 1 2
4 2 3 1 3 1 2 1 2 3
5 3 1 2 1 2 3 2 3 1

6 1 2 3 3 1 2 2 3 1
7 2 3 1 1 2 3 3 1 2
8 3 1 2 2 3 1 1 2 3

9 1 1 1 2 2 2 3 3 3
10 2 2 2 3 3 3 1 1 1
11 3 3 3 1 1 1 2 2 2

 

Suppose disk 4 has failed. A write operation to block (row) 5 of disk 4 requires a read or write operation to disk 2 and disk 9. For disk 1 and disk 7, the extra load will occur when blocks 0, 1, or 2 of disk 4 are being accessed. For disk 2 and 9, it will occur when blocks 3, 4, or 5 are being accessed. For disk 3 and disk 8, it will occur when blocks 6, 7, or 8 are being accessed. Finally for disk 5 and 6, the extra load will occur when blocks 9, 10, or 11 are being accessed. That is, the extra load will be equal to 3/12 for each surviving disk. The increase in load to each of the other disks is equal to 25%.

The parity groups' distribution look-up table can be used for other blocks besides block 0 to block 11. We can use the look-up table for blocks 12 to 23, 24 to 35 and so on. After a disk failure, there will be 3/12 increase in load in each segment for each surviving disk. Furthermore, it is quite easy to determine which row of the look-up table should be used for a particular block. For data in block i, only one operation, i mod 12 is necessary to determine which row in the look-up table should be used. The size of the look-up table is also independent of the number of blocks in a disk.

A general scheme to find parity groups' distribution look-up tables based on the above findings can be developed. For example, the look-up table M25,1 for p=5 and d=25, is listed below.


M5,1 M5,1 M5,1 M5,1 M5,1
M5,1 M5,2 M5,3 M5,4 M5,5
M5,1 M5,3 M5,5 M5,2 M5,4
M5,1 M5,4 M5,2 M5,5 M5,3
M5,1 M5,5 M5,4 M5,3 M5,2
I5,1 I5,2 I5,3 I5,4 I5,5

 

References

1 R. Muntz and J. Lui, Performance Analysis of Disk Arrays under Failure. In Proceedings of the 16th Conference of Very Large Data Bases, Pages 162-173, 1990.