A System Level Technique for On-Line Detection of Control Flow
Errors
- Z. Alkhalifa and V. S. S. Nair
- Computer Science and Engineering Department
- Southern Methodist University, Dallas, TX, USA
- N. Krishnamurthy and J. A. Abraham
- Computer Engineering Research Center
- University of Texas at Austin, Austin, TX, USA
- zeyad@seas.smu.edu
Computer systems that are used in high dependability
and integrity applications need to be designed with the capability to detect
and recover from the errors caused by hardware faults. Since the majority
of errors are usually transient and not reproducible, off-line testing will
not reliably detect them. Thus, it is imperative that the systems be designed
with built-in concurrent error detection and recovery mechanisms.
Enhanced Control flow Checking using Assertions
(ECCA) is a method of processing code by means of adding assertions to the
intermediate representation of the original program to guarantee the detection
of control flow errors [1]. It is a
successor to Control flow Checking using Assertions (CCA) which has
been proposed in the past for the concurrent detection of control flow errors
[2]. It is implemented to target the intermediate
level language. Our choice of an intermediate level language is the Register
Transfer Language (RTL) used in gnu's compilers.
ECCA divides a program into a set of blocks. We define
a block as a collection of consecutive Branch Free Intervals
(BFI) in which each block has a single entry point and a single exit point.
Block size is determined through an analysis of the tradeoff between overhead
and coverage. A unique different prime number ID greater then 2 called Block
IDentifier (BID) is assigned to each block. Figure 1 demonstrates grouping
3 BFIs within a block and assigning a unique BID to each block. Note that
the figure does not represent the RTL representation of code since if
and while statements are translated into simple jumps and conditionals.

Two assertions are added to each block. The first is the
SET assertion placed at the entry point of the block. It is of the following
format:
r1:=(r1-BID)*(r2-BID)
r1:= (BID+1)/[(r1+1)/(r1*2+1)]
,
where, r1 and r2 are
global registers, and BID is a unique prime number greater
than 2 generated for each block at compile time. Under correct execution,
either r1 or r2
(at the beginning of the SET assertion) will contain the current
BID value. This BID value was set from the previous TEST assertion. After
executing the first assignment of the SET assertion, r1
should be zero if no control flow error has occurred. In case
of a control flow error, r1 will be
non zero. In the second assignment, r1 will
be become (BID+1) if r1 is
zero (no error). However, if r1 is
non zero a divide by zero exception will be raised from the assignment,
thus, indicating that a control flow error has occurred.
The second assertion is the TEST assertion. It is
executed immediately before departing the block. Therefore, it is placed
at the exit point of the block. The assertion is as follows:
r1:=(r1-BID)*NEXT1
r2:=(r1-BID)*NEXT2
,
where, NEXT1 and NEXT2
represent the BIDs of the permissible blocks. Under correct execution
(r1-BID) will always equal 1, hence, r1
and r2 will be set to NEXT1
and NEXT2 respectively. If a control flow
error occurred, (r1-BID) will not be 1. Therefore,
r1 and r2 will be
set to a non prime multiple of the permissible blocks BID's. This in turn
will cause the error to be detected at the SET assertion of the next block.
ECCA has the ability to detect all single control
flow errors crossing block boundaries. It was previously shown that ECCA
[1] has the ability to detect any of
the the following control flow error types.
- A fault causing a branch from the middle of a block to another.
- A fault causing a branch to the middle of a block.
- A fault causing a branch to an illegal block.
The RTL stage of compilation is divided into numerous
passes. Each pass performs a certain optimization or function on the compiled
code. The further the code travels down the sequence of passes, the more
machine dependent it becomes. We now introduce a new pass named the ECCA
pass. In this pass all of the analyses and tasks done by ECCA will be performed.
In choosing the optimal location for the ECCA pass, we considered going
as low as possible bypassing many optimizations and still achieving machine
independence. A satisfactory location is between the Second Common Sub
Expression Elimination pass and the Stupid Register Allocation
pass. We also achieve independence by using the compiler's internal routines
for code flow analysis and assertion code generation in ECCA. By implementing
ECCA at the RTL level of compilation, we achieve the following.
- Portability across programming languages and architectures.
- Reduction in overhead since the placement of the assertions is performed
after most of the compiler's optimizations are done.
- More control over system resources needed for the assertions.
An evaluation of the high level ECCA (assertions are
added to the high level source code rather than the intermediate level code)
with a single BFI per block was performed using a software fault injection
tool named FERRARI developed at UT Austin [3].
FERRARI supports the following fault models: 
- AddIF: Address line error resulting in executing a different
instruction.
- AddIF2: Address line error resulting in execution two instructions.
- AddOF: Address line error when a data operand is fetched.
- AddOS: Address line error when an operand is stored.
- DataIF: Data line error when an operand is fetched.
- DataOF: Data line error when an operand is loaded.
- DataOS: Data line error when an operand is stored.
- CndCR: Errors in condition code flags.
Numerous experiments, conducted on a SUN SPARC workstation, were performed
on various spec benchmark programs. The experiments were repeated 4000 times
for each fault model for each spec benchmark program. Figure 2 shows the
reduced percentage of undetected errors as result of using ECCA on Esspresso.
Note that almost 100% fault detection has been achieved as a result of using
ECCA. The figure also demonstrates the ability of a data error to manifest
it self into a control flow error. Other spec benchmark programs showed
similar results.
References
- 1
- Z. Alkhalifa and V. S. S. Nair, ``Design of a Portable Control-Flow
Checking Technique'', Proceedings of the High-Assurance Systems Engineering
Workshop, Aug. 1997, pp 120-123.
- 2
- L. D. McFearin and V. S. S. Nair, ``Control Flow Checking Using Assertions'',
Proceedings of IFIP International Working Conference, Jul. 1994,
pp 103-112.
- 3
- G. Kanawati, N. Kanawati and J. Abraham, ``FERRARI: A Flexible Software-Based
Fault and Error Injection System'', IEEE Transactions on Computers,
Feb. 1995, pp 248-260.
|