Formal Validation of Fault-tolerance Mechanisms

Cinzia Bernardeschi(1), Alessandro Fantechi(2),
Stefania Gnesi(3), Antonella Santone (1)

1. Dipartimento di Ingegneria della Informazione, Univ. di Pisa
2. Dipartimento di Sistemi e Informatica, Univ. di Firenze
3. Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche

The European project GUARDS (Generic Upgradable Architecture for Real-time Dependable Systems) addresses the development of architectures, methods, techniques, and tools to support the design, the implementation and validation of critical systems [WBBP96]. The innovative approach of the project towards this objective lies on the elaboration and exploitation of a coherent methodology integrating five fundamental features: Genericity, to support reliability of hardware and software components and architectures in multiple applications and domains, in particular in space, nuclear and railway fields; Dependability, to support the design and implementation of fault tolerant mechanisms complying with safety and availability requirements; Real-time predictability, to support the fulfillment of the stringent real-time constraints imposed by the considered applications; Ability to be validated, to support the definition of instances of the generic architecture that comply with the requirements imposed by each of the target applications; Ability to be certified, to meet the certification obligations imposed by various standardisation authorities. The validation policy in GUARDS is based on the a priori validation of selected critical components and mechanisms of the architecture [Arlat97].

We report the experience done to formally validate a particular fault-tolerance mechanism of GUARDS, the Inter-consistency mechanism. This mechanism is responsible of the consistency of the outputs produced by the channels (redundant processors), also in presence of faults in a minority of them. The formal validation of the mechanism has been carried out following an approach based on model checking. Model checking approaches [CES86] work on a finite state model of the behavior of the system, derived by the specification of the system following an operational approach. Verification is carried out automatically by expressing a desired property of the system as a temporal logic formula, and by checking the satisfiability of the formula in a polynomial time complexity over the model of the system.

The formal verification of the Inter-consistency mechanism requires that the given specification satisfies some correctness properties also in presence of faults. Faults are formalised as actions, whose occurrence simulates the occurrence of the fault in the system. In particular, the specification of faults in terms of actions appears adherent to the original description of the fault tolerant mechanism to be validated; and gives us control over the occurrence of faults.

The automatic verification of the mechanism is done according to the following steps:
1. The components of the mechanism are specified (as process algebra terms or alternatively by means of Labelled Transition Systems (LTSs) and networks of LTSs).
2. Processes that constraint the actual occurrence of fault actions, according to the considered fault hypothesis, are added to the specification.
3. The global model of the behaviour of the Inter-consistency mechanism under the fault hypothesis, as a single LTS, is generated.
4. The properties that express the correctness or fault tolerance requirements on the mechanism are specified by ACTL temporal logic [DNV90] formulae.

The tools of the JACK ( Just Another Concurrency Kit) environment [BGL94] support both the specification and the verification phase. Detailed information about JACK are available

The validation experience on the Inter-consistency mechanism has first addressed an instance of this mechanism composed of three communicating nodes. Covering a mechanism with four nodes case has required writing a new specification of the nodes of the network. In this case the number of messages and the number of variables increase for each node. Moreover, since the number of nodes also increases, the increment of the number of states of the global model of the mechanism is quite large. This is a typical state explosion problem case, and indeed the standard tools in the JACK environment fail to work on such a large model for lack of memory. We have therefore employed methods of compactly representing the state space by an implicit description based on Binary Decision Diagram (BDD) in the development of new efficient model checking tools. In particular, a model checker has been recently built for ACTL, based on symbolic algorithms and implicit representation of the labeled transition relation of LTSs [BAMC]. Work is in progress on the application of these new model checking tools to check the Agreement and Validity properties for the four node case.

Given the validation effort done for the three nodes case, we could expect that the validation for the four nodes case could exploit the already built model, by performing some kind of incremental extension, that would avoid to generate again the state space global model from scratch. Unfortunately, this is not possible due to the structure of the mechanism itself: validation has to be repeated on each instance of the Inter-consistency mechanism generic definition, generating each time the global state space.

On the other hand, a validation of the generic, n-node, Inter-consistency algorithm, could be performed using its recursive definition and using validation approaches based on theorem proving [GPR95], rather than model cheching.

In the framework of the GUARDS project, however, we have considered that the GUARDS architecture is going to be exploited by three different pilot applications, namely a nuclear submarin control subsystem, a railway interlocking system, and a space-station payload control system. Of the three cases, the first employs an instance of the GUARDS architecture having just two channels, where the Inter-consistency mechanism does not apply, while the second one employs three channels and the latter four channels. Hence, for the purpose of this project, it has been considered more convenient to perform a model checking based validation of the Inter-consistency algorithm with three and four nodes.

This parallels the fact that during the implementation phase, different running code will be written for the Inter-consistency mechanism with three nodes and with four nodes, since the recursive definition cannot be applied in an embedded system, where determinism of execution time is of paramount importance. This means that the object of our model checking based validation is much closer to the actually running code than to the generic algorithm. This is in our opinion more appealing to end users and regulatory bodies.

The experience carried out in the validation of the fault-tolerance Inter-consistency mechanism confirms the primary role of model checking approaches inside formal validation. On the basis of this experience, we can envisage model checking as a basic approach to the validation of particular instances of systems. In addition to having the role of automating and speeding up subproofs in a theorem proving approach to formal validation, as exemplified in [LR93]. On the other hand, theorem proving could be used instead to validate ad hoc abstraction techniques that are often necessary to deal with state explosion problems. The complementarity of the two approaches will be exploited in the next future to gain a larger industrial applicability of formal methods to cope with the ever more challenging validation requirements of the future safety critical computer applications.

References

[Arlat97] J. Arlat. Preliminary definition of the GUARDS validation strategy. GUARDS Deliverable D3O1, LAAS -CNRS Report 96378, January 1997.

[BGL94] A. Bouali, S. Gnesi, S. Larosa. The integration Project for the JACK Environment. Bulletin of the EATCS, 54, October 1994, pp.\ 207--223.

[CES86] E.M. Clarke, E.A. Emerson, A.P. Sistla. Automatic Verification of Finite--State Concurrent Systems Using Temporal Logic Specification. ACM Transaction on Programming Languages and Systems, 8 (2), April 198 6, pp. 244--263.

[DNV90] R. De Nicola, F.W. Vaandrager. Actions versus State Based Logics for Transition Systems. In Proc. Ecole de Printemps on Semantics of Concurrency, Lecture Notes in Computer Science vol. 469, Springer, Berlin, 1990, pp. 407-419.

[BAMC] A. Fantechi, S. Gnesi, F. Mazzanti, R. Pugliese, E. Tronci. A Symbolic Model Checker for Action-based CTL. Submitted for publication.

[GPR95] L. Gong, P. Lincoln, J. Rushby. Byzantine Agreement with Authentication: Observations and Applications in Tolerating Hybrid and Link Faults. Proc. 5th Conference on Dependable Computing for Critical Applications (DCCA-5), Urbana-Champaign, Il, USA, 1995.

[LR93] P. Lincoln, J. Rushby. A formally verified algorithm for Interactive Consistency under hybrid fault model. Proc. 523rd International Symposium on Fault-Tolerant Computing (FTCS 23), Toulose, France, pp. 402-411, June 1993.

[WBBP96] A. Wellings, L. Beus-Dukic, A. Burns, D. Powell. Genericity and Upgradability in Ultra-Dependable Real-Time Architectures. Work in Progress Proceedings, Real-Time Systems Symp., Washington D.C. , pp.15-18, IEEE Computer Society Press, December 1996.