The Byzantine Generals Problem: Identifying the Traitors
- B. Ayeb and A. Farhat
- University of Sherbrooke
- DMI-- RiFa Lab
There is a variety of algorithms
proposed in the literature addressing the Byzantine problem. Their common
objective is to mask the effect of faulty units and reach an agreement
among fault free units. This ongoing work aims at identifying faulty
units while the process of agreement is running. Beyond theoretical results,
such a identification of faulty units contributes to:
- accelerate the agreement process itself.
- reduce drastically the number of messages.
The overhead for identifying faulty units
is rather not expensive, it is in O(n^3), where n denotes the number of
units. We assume
that units communicate with oral messages within a synchronous and reliable
communication system. Finally, a faulty unit could behave arbitrarily
(i.e., the Byzantine fault model), whereas t denotes the number of faulty
units and does not exceed n/3. The proposed mechanism for the identification
of faulty units uses the following prosecution principle depicted on Figure
1.
unit q -------value v-------> unit p
\ /
\ /
\ value v'
\ /
\ /
\-----unit r------/
Figure 1
Suppose that unit p receives directly
a value, say v, from unit q and receives a second value, say v', via
unit r. If v and v' differ then unit p suspects the couple of units q and
r.
Using the collected couples, unit p
could identify the faulty units in the following way. Whenever, a unit,
say q, occurs more than t times then unit p declares q as faulty . In case
where unit q occurs k times then a simple procedure in O(n^3) ensures
to unit p if it could declare unit q as faulty. Finally, whenever
unit p declares that unit q is faulty, it stops to communicate with q. It
has been shown that this does not affect the process of agreement.
Now suppose that unit p owns its diagnosis
(i.e., a subset of faulty units) and another unit, say r, has also
its own diagnosis. We propose a safe protocol for diagnoses exchange, where
units p and r could make profit each from other. Note that the protocol
takes into account the fact that a faulty unit tries to send a non
correct diagnosis.
As one could expect, without additional constraints
it is impossible to ensure that all faulty units are identified. That why
we state the necessary and sufficient conditions for the identification
of all and only faulty units. Succinctly, the conditions specifies that
all faulty units should prosecuted approximately from t+1 to n^2 times.
Considering the fact that the number of exchanged messages is in O(n^t),
the requirement is rather reasonable.
By designing a grouped architecture
of units, we also succeed in designing an efficient algorithm for the identification
of all faulty units without additional constraints. In fact, the identification
of faulty units makes profit from a well suited broadcast primitive
as well as the distributed architecture of units into a collection of communicating
groups.
On going work includes theoretical analysis,
analytical comparison with related work [Sh 87, Co 93, Ch 96], and experimentation.
Current work aims at measuring the actual profits, namely in terms
of messages exchange, and compares the running time of agreement with (rep.
without) faults identification. First experimentation's shows an appreciable
gain but further experimentation should shed more lights.
References
[Sh 87] K. G. Shin &
P. Ranmathan "Diagnosis of Processors with Byzantine Faults in a Distributed
Computing System," 55-60, Proceedings of the FTCS 17, IEEE Computer
Society Press, 1987.
[Co 93] B. A. Coan "Efficient
Agreement Using Fault Diagnosis," Distributed Computing,
(7): 87-98, Springler Verlag, 1993
[Ch 96] T. D. Chandra & S. Toueg
"Unreliable Failure Detectors for Reliable Distributed Systems,"
JACM, 43(2): 225-267, ACM Press, 1996.
[Ay 98] B. Ayeb & A. Farhat
"The Byzantine Problem: Masking and Demasking Faults ,"
Under Press, 1998.
|