A Hybrid Fault Injection Methodology for Real Time Systems

 

A. Benso, P.L. Civera, M. Rebaudengo, M. Sonza Reorda 1 
Politecnico di Torino, Torino, Italy
E-mail: {benso, civera, reba, sonza}@polito.it
A. Ferro
Prodigital S.n.C.,
Isola d'Asti, Italy
E-mail: prodigit@tin.it

 

  1. Introduction 

Our society is facing with an increasing dependence on computing systems, even in areas where a failure can be critical for the safety of human beings. Fault injection emerged as a viable solution for studying the behavior of computer-based systems when faults occur, and has been deeply investigated by both academia and industry. Several Fault Injection techniques have been proposed and practically experimented; they can basically be grouped into simulation-based techniques [1], software-implemented techniques [2], and hardware-based techniques [3]. 

The goal of this note is to present a fault injection system suited to be used in embedded microprocessor-based boards. The system is based on a hybrid (software and hardware) approach: a non-intrusive ad hoc hardware (called controller) monitors the target board bus to perform crucial operations such as activating the fault injection procedure at the proper fault injection time or triggering time-out conditions without modifying the target system behavior. The very low intrusiviness in the target system behavior of this additional hardware makes the system ideally suitable to real-time applications. 

We present a prototypical version of a tool implementing the proposed approach on a commercial board based on a M68040 microprocessor. 
 

2. Fault Injection System 
 
The overall Fault Injection system runs on two different units connected by a serial port interface: a host computer and the actual target board. The system exploits the routines available through the built-in ROM Monitor of the target board to implement the communication interface between the two units, to download the code into the target board, and to the analyze the system behavior. 

The adopted fault model is the transient single bit-flip fault. This model is frequently used in fault injection tools since it is highly representative of faults occurring in real systems [4]. Nevertheless, the approach can be easily extended to other fault models. Each fault is thus characterized by the following information: 

  • fault injection time: each fault is injected at the assembly level, before the execution of an instruction. The fault injection time is thus expressed in terms of number of instructions executed since the beginning of the target program;
  • fault location: the address of the memory location or the name of the register where the fault has to be injected; 
  • fault mask: the bit mask that selects the bit(s) to be flipped.

Our technique is ideally suited to systems whose behavior, in presence of a given sequence of input stimuli, can be deterministically computed and easily reproduced. 

The fault injection system can be divided in three modules (Fig. 1): 

  • The Fault List Manager generates the fault list to be injected into the target system; the list is generated according to some constraints, to avoid faults outside the boundaries of the code or data segment or after the target program termination.
  • The Fault Injection Manager injects the faults into the target system. This module is controlled by an extra hardware and will be described in the following section.
  • The Result Analyzer captures the system output behavior during the Fault Injection experiments, collects the results, and produces a report concerning the whole experiment.

Fig. 1: The Fault Injection environment.

3. Fault Injection Manager 

The Fault Injection Manager is the most crucial part in the whole Fault Injection System. A hardware controller monitors the processor in order to start the injection of a fault selected from the fault list, or to stop the program execution if a time-out condition has occurred. The following paragraphs describe the controller architecture and its tasks.

3.1  Controller architecture and programming 

The hardware board is connected to the CPU Bus and works as a peripheral from the processor point of view. It is memory mapped, so that the CPU can program and control it through simple memory write and read instructions.

To correctly execute a single fault injection experiment the controller must receive some commands before starting the target program:

  • set_injection: it defines the fault injection time, i.e. the number of executed instructions before the fault injection
  • set_timeout: it defines a maximum number of instructions that can be executed before stopping the experiment
  • start: the controller begins to count the instructions executed by the processor
  • stop: the controller becomes idle and waits for other commands to start the next experiment.

The host computer sends the commands to the controller using the serial interface.

The controller performs two kinds of operations:

  • in off-line mode, it acts as a peripheral device, and it can properly receive and react to read and write commands from the CPU 
  • in on-line mode, it continuously monitors the processor status pins, and counts the number of executed instructions from the last start command; when the fault injection time is reached or the time-out condition is verified, it generates the corresponding interrupts to the CPU, implementing the handshake procedure required by the interrupt protocol. 

A Programming Logic Device guarantees the re-programmability and the flexibility of  the controller. The PLD must match the strict time requirements of the bus protocol needed to decode the address, read the command, and generate an interrupt. The controller has been realized with 2 Xilinx FPGAs and some extra logic mounted on a PCB connected to the target application bus.

3.2 Fault Injection 

The controller counts the number of executed instructions by analyzing the processor status pins that indicate the internal execution unit's status. The controller has been designed considering a M68040 microprocessor, but the approach is general thanks to the availability of this kind of pins in almost all processors.

As soon as the instruction counter matches the injection time of the fault that has to be injected, the controller sends an interrupt to the processor. The interrupt handling routine is in charge of injecting the fault. This is the only intrusiveness introduced by our fault injection system. The execution of this routine consists of a very limited number of instructions and can be generally well tolerated by a real-time system. 

3.3 Time-out condition 

The controller continuosly monitors the internal instruction counter: if its value exceeds a user-defined limit, an interrupt is sent to the processor. The time-out interrupt handling routine terminates the experiment and sends a message on the serial interface. 

4. Conclusions 

In this note we presented a fault injection environment suitable to be used for fault coverage evaluation of microprocessor-based boards. 

During the fault injection experiments, the target application program is executed at speed and faults are injected by an interrupt handler routine triggered by a low-cost extra board, without any modification in the target application code and with minimum intrusiveness in the system behavior. This allows a very high speed in the overall fault injection experiment and makes it suitable for real-time systems. The approach is quite general and flexible, as it is based on common features supported by most microprocessors.

To practically evaluate the feasibility of the approach, a fault injection environment has been set up for a commercial board based on a Motorola 68040 processor and it is currently being evaluated on some benchmarks applications.

 
References    [1] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, J. Karlsson, Fault injection into VHDL Models: the MEFISTO Tool, Proc. FTCS-24, 1994, pp. 66-75 
[2]  G.A. Kanawati, N.A. Kanawati, J.A. Abraham, FERRARI: A Flexible Software-Based Fault and Error Injection System, IEEE Trans. on Computers, Vol 44, N. 2, February 1995, pp. 248-260
[3] J. Arlat et al., Fault Injection for Dependability Validation: A Methodology and some Applications, IEEE Transactions on Software Engineering, Vol.  16, No. 2, Feb. 1990, pp. 166-182
[4] P.K. Lala, Fault Tolerant and Fault Testable Hardware Design, Prentice Hall Int., New York, 1985


1. Contact Author: 
  Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, I-10129, Torino, Italy Phone: +39 11 564 7055. Fax: +39 11 564 7099  E-Mail: sonza@polito.it