Maximizing Availability of Quorum-Based Mutual Exclusion Mechanisms
-
- Tatsuhiro Tsuchiya Tohru Kikuno 1
- Department of Informatics and Mathematical Science
- Osaka University
1. Introduction.
The distributed mutual exclusion problem is to guarantee that at most one
computing node can enter a critical section at a time. This problem is widely
recognized as one of the most fundamental problems in distributed computing
since it arises in various kinds of distributed systems. A number of mutual
exclusion mechanisms have been proposed so far. Among them, quorum-based
mechanisms are known to be especially robust. Their applications include
replicated database systems, name servers, emulation of shared memory, and
others. In order to achieve mutual exclusion, a
quorum-based mechanism uses a special set of node groups, called a coterie[2]. In a coterie, any two node groups have at least
one node in common (intersection property), and any node group is
not a superset of other node groups (minimality). Node groups in
a coterie are called quorums. For instance, when v1,
v2, v3 are nodes, C = { {v1,
v2}, {v1, v3}, {v2,
v3} } is an example of a coterie.
Once a coterie was determined, mutual exclusion can be achieved as follows:
Before entering the critical section, a node has to acquire permission from
all the nodes in at least one quorum. Each node is allowed to give its permission
to at most one node. Then, by the intersection property, it is guaranteed
that at most one node can enter the critical section at a time. In the presence
of node and link failures, quorum-based mechanisms provide desirable fault-tolerance
capability: even when the system is partitioned into isolated node groups,
called partition groups, due to failures, mutual exclusion is still
achieved if one of the isolated node groups includes a quorum. For instance,
consider a network G=(V, E) shown in Figure 1(a). When
a node v5 in V and an edge e4,6
in E have failed, the configuration of the network becomes as shown
in Figure 1(b). In this case, two node groups {v1, v2,
v3, v4} and {v6} form
partition groups. If there is a quorum included by one of the partition
groups, there remain some nodes that can enter the critical section.
Since the availability of a mutual exclusion mechanism is defined as
the probability that the critical section can be entered, it equals the
probability that there is a quorum whose nodes are operational and connected.
It is then clear that the availability depends completely on the coterie
adopted by the mechanism. Therefore we use the term availability of a
coterie as availability of the mutual exclusion mechanism, interchangeably.
We have been conducting research on evaluation and maximization of availability
of coteries on unreliable networks with arbitrary topologies. Specifically,
- we develop a method for evaluating availability
of any coteries, and
- we also develop a method for obtaining optimal
coteries that maximize availability.
Except for theoretical work in [3], these results
are the first ones that address the issues on availability of coteries on
arbitrary networks. Due to the limit of space, we outline only the method
for availability maximization in the following. 2. Finding Optimal Coteries. Much research
has been conducted for finding coteries that achieve high availability.
For special networks, such as fully connected networks with reliable links,
trees, and rings, methods of obtaining optimal coteries have been proposed.
However, no method has been developed for networks with arbitrary topologies.
We tackle this problem by extending Tang and Natarajan's
integer-programming approach[4]. They formulated
this problem into a 0-1 integer programming problem. In this approach, for
every node group, the probability that it becomes a partition group has
to be obtained in order to formulate the problem. However, any analytic
method for calculating the probability has not been developed. To overcome
this deficiency, we propose a new algorithm to compute the probability that
each node group forms a partition group. A straightforward
method for computing this probability for every node group is to examine
all possible configurations of the network. However, since the total number
of possible configurations is equal to 2|V|+|E|,
we can easily conjecture that this method would take very long running time
to accomplish computation. To achieve computation
more effectively, the proposed algorithm examines subgraphs of G,
instead of configurations themselves. Notice that partition groups appearing
in a configuration are exactly the same as those appearing in its corresponding
maximal subgraph. The proposed algorithm accomplishes computation by doing
the following for every subgraph of G: First, this algorithm determines
all partition groups in the subgraph. Then, it computes the probability
that the current configuration corresponds to that subgraph, and accumulates
the probability for the partition groups. Using
this algorithm, we have succeeded in obtaining optimal coteries on various
networks. (We used a commercial software tool called Lindo for solving
the integer programming problem.) To the best of our knowledge, these optimal
coteries are the first ones obtained for general networks with unreliable
nodes and links. For example, an optimal coterie and the optimal value of
the availability on the network in Figure 1 are as follows:
{ {v3, v4}, {v2,
v3, v5}, {v4, v5},
{v2, v4, v6}, {v3,
v5, v6} }
Availability = 0.9646616
(For simplicity, we assume that all components have the same reliability
equal to 0.9.) 3. Experimental Results.
In this section, we present the results of an experiment we conducted for
various networks. Figure 2 shows the networks used in this experiment. We
assumed that the reliability of every node is 0.90 and that of every link
is 0.95 just for simplicity.
Table 1 summarizes unavailabilities of optimal coteries we obtained.
Here we define unavailability as 1 - availability. For comparison, we also
evaluated availabilities (and unavailabilities) of heuristically-designed
coteries using an evaluation method we developed. We chose two heuristic
schemes proposed by Barbara and Garcia-Molina[1]
and Tong and Kain[5]. The results of this evaluation
are also shown in Table 1.
Table 1. Unavailabilities of coteries.
| Network |
Optimal |
B&G |
T&K |
| #1 |
1.6169 x10-2 |
1.8081 x10-2 |
1.8081 x10-2 |
| #2 |
1.0293 x10-2 |
1.1337 x10-2 |
1.0895 x10-2 |
| #3 |
1.3753 x10-2 |
1.9543 x10-2 |
1.5193 x10-2 |
| #4 |
2.2963 x10-2 |
3.8274 x10-2 |
3.1616 x10-2 |
| #5 |
1.9795 x10-2 |
2.6282 x10-2 |
2.6282 x10-2 |
| #6 |
1.4605 x10-2 |
3.2469 x10-2 |
3.2996 x10-2 |
From this table, it can be seen that the two heuristic schemes achieve
almost the same availability in all cases. In contrast, the difference in
unavailability between the optimal coterie and the heuristically-constructed
coteries becomes larger when the size of the networks grows. Especially
for Network 6, the unavailability of the optimal coterie is only 45 percent
of that of B&G coterie. This fact implies that the risk of the mutual
exclusion mechanism being down can be reduced by half by adopting an optimal
coterie. Besides, this improvement of availability is obtained without introducing
any additional redundancy. Thus we can conclude that choosing an appropriate
coterie is a very significant task for achieving reliable mutual exclusion.
References
[1] D.Barbara and H.Garcia-Molina, "The reliability
of voting mechanism,'' IEEE Trans. Computers, Vol. C-36, No. 10,
pp.1197-1207, 1987.
[2] H.Garcia-Molina and D.Barbara, "How to assign votes
in a distributed system,'' J. ACM, Vol. 32, No. 4, pp.841-860, 1985.
[3] T.Harada and M.Yamashita, "Nondominated coteries
on graphs," IEEE Trans. Parallel and Distributed Systems, Vol.
8, No. 6, pp.667-672, 1997.
[4] J.Tang and N.Natarajan, "Obtaining coteries that
optimize the availability of replicated databases,'' IEEE Trans. Knowledge
and Data Engineering, Vol. 5, No. 2, pp.309-321, 1993.
[5] Z.Tong and R.Y.Kain, "Vote assignments in weighted
voting mechanisms,'' IEEE Trans. Computers, Vol. 40, No. 5, 1991.
|