Maximizing Availability of Quorum-Based Mutual Exclusion Mechanisms

Tatsuhiro Tsuchiya     Tohru Kikuno 1 
Department of Informatics and Mathematical Science
Osaka University

1. Introduction. The distributed mutual exclusion problem is to guarantee that at most one computing node can enter a critical section at a time. This problem is widely recognized as one of the most fundamental problems in distributed computing since it arises in various kinds of distributed systems. A number of mutual exclusion mechanisms have been proposed so far. Among them, quorum-based mechanisms are known to be especially robust. Their applications include replicated database systems, name servers, emulation of shared memory, and others. 

In order to achieve mutual exclusion, a quorum-based mechanism uses a special set of node groups, called a coterie[2]. In a coterie, any two node groups have at least one node in common (intersection property), and any node group is not a superset of other node groups (minimality). Node groups in a coterie are called quorums. For instance, when v1, v2, v3 are nodes, C = { {v1, v2}, {v1, v3}, {v2, v3} } is an example of a coterie. 
 
Once a coterie was determined, mutual exclusion can be achieved as follows: Before entering the critical section, a node has to acquire permission from all the nodes in at least one quorum. Each node is allowed to give its permission to at most one node. Then, by the intersection property, it is guaranteed that at most one node can enter the critical section at a time. In the presence of node and link failures, quorum-based mechanisms provide desirable fault-tolerance capability: even when the system is partitioned into isolated node groups, called partition groups, due to failures, mutual exclusion is still achieved if one of the isolated node groups includes a quorum. For instance, consider a network G=(V, E) shown in Figure 1(a). When a node v5 in V and an edge e4,6 in E have failed, the configuration of the network becomes as shown in Figure 1(b). In this case, two node groups {v1, v2, v3, v4} and {v6} form partition groups. If there is a quorum included by one of the partition groups, there remain some nodes that can enter the critical section. 

Since the availability of a mutual exclusion mechanism is defined as the probability that the critical section can be entered, it equals the probability that there is a quorum whose nodes are operational and connected. It is then clear that the availability depends completely on the coterie adopted by the mechanism. Therefore we use the term availability of a coterie as availability of the mutual exclusion mechanism, interchangeably. 

We have been conducting research on evaluation and maximization of availability of coteries on unreliable networks with arbitrary topologies. Specifically, 

  • we develop a method for evaluating availability of any coteries, and
     
  • we also develop a method for obtaining optimal coteries that maximize availability.

Except for theoretical work in [3], these results are the first ones that address the issues on availability of coteries on arbitrary networks. Due to the limit of space, we outline only the method for availability maximization in the following. 

2. Finding Optimal Coteries. Much research has been conducted for finding coteries that achieve high availability. For special networks, such as fully connected networks with reliable links, trees, and rings, methods of obtaining optimal coteries have been proposed. However, no method has been developed for networks with arbitrary topologies. 

We tackle this problem by extending Tang and Natarajan's integer-programming approach[4]. They formulated this problem into a 0-1 integer programming problem. In this approach, for every node group, the probability that it becomes a partition group has to be obtained in order to formulate the problem. However, any analytic method for calculating the probability has not been developed. To overcome this deficiency, we propose a new algorithm to compute the probability that each node group forms a partition group. 

A straightforward method for computing this probability for every node group is to examine all possible configurations of the network. However, since the total number of possible configurations is equal to 2|V|+|E|, we can easily conjecture that this method would take very long running time to accomplish computation. 

To achieve computation more effectively, the proposed algorithm examines subgraphs of G, instead of configurations themselves. Notice that partition groups appearing in a configuration are exactly the same as those appearing in its corresponding maximal subgraph. The proposed algorithm accomplishes computation by doing the following for every subgraph of G: First, this algorithm determines all partition groups in the subgraph. Then, it computes the probability that the current configuration corresponds to that subgraph, and accumulates the probability for the partition groups. 

Using this algorithm, we have succeeded in obtaining optimal coteries on various networks. (We used a commercial software tool called Lindo for solving the integer programming problem.) To the best of our knowledge, these optimal coteries are the first ones obtained for general networks with unreliable nodes and links. For example, an optimal coterie and the optimal value of the availability on the network in Figure 1 are as follows: 

{ {v3, v4}, {v2, v3, v5}, {v4, v5}, {v2, v4, v6}, {v3, v5, v6} }

Availability  =  0.9646616

(For simplicity, we assume that all components have the same reliability equal to 0.9.)   3. Experimental Results. In this section, we present the results of an experiment we conducted for various networks. Figure 2 shows the networks used in this experiment. We assumed that the reliability of every node is 0.90 and that of every link is 0.95 just for simplicity. 

Table 1 summarizes unavailabilities of optimal coteries we obtained. Here we define unavailability as 1 - availability. For comparison, we also evaluated availabilities (and unavailabilities) of heuristically-designed coteries using an evaluation method we developed. We chose two heuristic schemes proposed by Barbara and Garcia-Molina[1] and Tong and Kain[5]. The results of this evaluation are also shown in Table 1. 

 Table 1. Unavailabilities of coteries.

Network

Optimal

B&G

T&K

#1

1.6169 x10-2

1.8081 x10-2

1.8081 x10-2

#2

1.0293 x10-2

1.1337 x10-2

1.0895 x10-2

#3
1.3753 x10-2

1.9543 x10-2

1.5193 x10-2

#4

2.2963 x10-2

3.8274 x10-2

3.1616 x10-2

#5

1.9795 x10-2

2.6282 x10-2

2.6282 x10-2

#6

1.4605 x10-2

3.2469 x10-2

3.2996 x10-2

From this table, it can be seen that the two heuristic schemes achieve 
almost the same availability in all cases. In contrast, the difference in unavailability between the optimal coterie and the heuristically-constructed coteries becomes larger when the size of the networks grows. Especially for Network 6, the unavailability of the optimal coterie is only 45 percent of that of B&G coterie. This fact implies that the risk of the mutual exclusion mechanism being down can be reduced by half by adopting an optimal coterie. Besides, this improvement of availability is obtained without introducing any additional redundancy. Thus we can conclude that choosing an appropriate coterie is a very significant task for achieving reliable mutual exclusion.

 

References

[1] D.Barbara and H.Garcia-Molina, "The reliability of voting mechanism,'' IEEE Trans. Computers, Vol. C-36, No. 10, pp.1197-1207, 1987. 
[2] H.Garcia-Molina and D.Barbara, "How to assign votes in a distributed system,'' J. ACM, Vol. 32, No. 4, pp.841-860, 1985. 
[3] T.Harada and M.Yamashita, "Nondominated coteries on graphs," IEEE Trans. Parallel and Distributed Systems, Vol. 8, No. 6, pp.667-672, 1997. 
[4] J.Tang and N.Natarajan, "Obtaining coteries that optimize the availability of replicated databases,'' IEEE Trans. Knowledge and Data Engineering, Vol. 5, No. 2, pp.309-321, 1993. 
[5] Z.Tong and R.Y.Kain, "Vote assignments in weighted voting mechanisms,'' IEEE Trans. Computers, Vol. 40, No. 5, 1991. 
 


1. Author contact: Department of Informatics and Mathematical Science, Osaka University. 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan. Phone: +81-6-850-6567. Fax: +81-6-850-6567. E-Mail: t-tutiya@ics.es.osaka-u.ac.jp