A System Level Technique for On-Line Detection of Control Flow Errors

Z. Alkhalifa and V. S. S. Nair
Computer Science and Engineering Department
Southern Methodist University, Dallas, TX, USA
N. Krishnamurthy and J. A. Abraham
Computer Engineering Research Center
University of Texas at Austin, Austin, TX, USA
 zeyad@seas.smu.edu

    Computer systems that are used in high dependability and integrity applications need to be designed with the capability to detect and recover from the errors caused by hardware faults. Since the majority of errors are usually transient and not reproducible, off-line testing will not reliably detect them. Thus, it is imperative that the systems be designed with built-in concurrent error detection and recovery mechanisms.

    Enhanced Control flow Checking using Assertions (ECCA) is a method of processing code by means of adding assertions to the intermediate representation of the original program to guarantee the detection of control flow errors [1]. It is a successor to Control flow Checking using Assertions (CCA) which has been proposed in the past for the concurrent detection of control flow errors [2]. It is implemented to target the intermediate level language. Our choice of an intermediate level language is the Register Transfer Language (RTL) used in gnu's compilers.

    ECCA divides a program into a set of blocks. We define a block as a collection of consecutive Branch Free Intervals (BFI) in which each block has a single entry point and a single exit point. Block size is determined through an analysis of the tradeoff between overhead and coverage. A unique different prime number ID greater then 2 called Block IDentifier (BID) is assigned to each block. Figure 1 demonstrates grouping 3 BFIs within a block and assigning a unique BID to each block. Note that the figure does not represent the RTL representation of code since if and while statements are translated into simple jumps and conditionals.

tex2html_wrap172

   Two assertions are added to each block. The first is the SET assertion placed at the entry point of the block. It is of the following format:

r1:=(r1-BID)*(r2-BID)

r1:= (BID+1)/[(r1+1)/(r1*2+1)] ,

where, r1 and  rare global registers, and BID is a unique prime number greater than 2 generated for each block at compile time. Under correct execution, either  ror  r(at the beginning of the SET assertion) will contain the current BID value. This BID value was set from the previous TEST assertion. After executing the first assignment of the SET assertion,  rshould be zero if no control flow error has occurred. In case of a control flow error,  r will be non zero. In the second assignment, r will be become (BID+1) if  ris zero (no error). However, if  ris non zero a divide by zero exception will be raised from the assignment, thus, indicating that a control flow error has occurred.

    The second assertion is the TEST assertion. It is executed immediately before departing the block. Therefore, it is placed at the exit point of the block. The assertion is as follows:

r1:=(r1-BID)*NEXT1

 r2:=(r1-BID)*NEXT2 ,

where, NEXT1 and NEXT2  represent the BIDs of the permissible blocks. Under correct execution  (r1-BID) will always equal 1, hence, r and r2 will be set to NEXT1 and NEXT2  respectively. If a control flow error occurred, (r1-BID) will not be 1. Therefore, r and r2 will be set to a non prime multiple of the permissible blocks BID's. This in turn will cause the error to be detected at the SET assertion of the next block.
 
     ECCA has the ability to detect all single control flow errors crossing block boundaries. It was previously shown that ECCA [1] has the ability to detect any of the the following control flow error types.

  • A fault causing a branch from the middle of a block to another.
  • A fault causing a branch to the middle of a block.
  • A fault causing a branch to an illegal block.

    The RTL stage of compilation is divided into numerous passes. Each pass performs a certain optimization or function on the compiled code. The further the code travels down the sequence of passes, the more machine dependent it becomes. We now introduce a new pass named the ECCA pass. In this pass all of the analyses and tasks done by ECCA will be performed. In choosing the optimal location for the ECCA pass, we considered going as low as possible bypassing many optimizations and still achieving machine independence. A satisfactory location is between the Second Common Sub Expression Elimination pass and the Stupid Register Allocation pass. We also achieve independence by using the compiler's internal routines for code flow analysis and assertion code generation in ECCA. By implementing ECCA at the RTL level of compilation, we achieve the following.

  • Portability across programming languages and architectures.
  • Reduction in overhead since the placement of the assertions is performed after most of the compiler's optimizations are done.
  • More control over system resources needed for the assertions.

    An evaluation of the high level ECCA (assertions are added to the high level source code rather than the intermediate level code) with a single BFI per block was performed using a software fault injection tool named FERRARI developed at UT Austin [3]. FERRARI supports the following fault models: 

  • AddIF: Address line error resulting in executing a different instruction.
  • AddIF2: Address line error resulting in execution two instructions.
  • AddOF: Address line error when a data operand is fetched.
  • AddOS: Address line error when an operand is stored.
  • DataIF: Data line error when an operand is fetched.
  • DataOF: Data line error when an operand is loaded.
  • DataOS: Data line error when an operand is stored.
  • CndCR: Errors in condition code flags.

Numerous experiments, conducted on a SUN SPARC workstation, were performed on various spec benchmark programs. The experiments were repeated 4000 times for each fault model for each spec benchmark program. Figure 2 shows the reduced percentage of undetected errors as result of using ECCA on Esspresso. Note that almost 100% fault detection has been achieved as a result of using ECCA. The figure also demonstrates the ability of a data error to manifest it self into a control flow error. Other spec benchmark programs showed similar results.

References

1
Z. Alkhalifa and V. S. S. Nair, ``Design of a Portable Control-Flow Checking Technique'', Proceedings of the High-Assurance Systems Engineering Workshop, Aug. 1997, pp 120-123.
2
L. D. McFearin and V. S. S. Nair, ``Control Flow Checking Using Assertions'', Proceedings of IFIP International Working Conference, Jul. 1994, pp 103-112.
3
G. Kanawati, N. Kanawati and J. Abraham, ``FERRARI: A Flexible Software-Based Fault and Error Injection System'', IEEE Transactions on Computers, Feb. 1995, pp 248-260.