| |
Wrapping Windows NT Binary Executables for Failure
Simulation 1
-
Anup K. Ghosh and Matt Schmid
-
-
Reliable Software Technologies
21515 Ridgetop Circle, #250
Sterling, VA 20166
{aghosh,mschmid}@rstcorp.com
www.rstcorp.com
Introduction
In this short paper, we describe a tool for testing the
reliability and robustness of Windows NT software applications under stressful
environmental conditions, i.e., under system resource failure conditions.
Windows NT systems are increasingly being deployed in mission-critical
applications such as for command and control in US Navy ships [Bin
98]. However, as recently as July, 1998, the Navy's Aegis missile
cruiser, USS Yorktown, suffered a significant software problem in the
Windows NT systems that control the ``smart ship'' that effectively left
the ship dead in the water [Sla 98]. The ship had to
be towed to the Norfolk Naval shipyard because a database overflow error
(resulting from a divide by zero operation) caused the ship's propulsion
system to fail.
The research approach and prototype tool described here
are specifically designed to analyze commercial off-the-shelf (COTS) software
for Win32 systems where source code is not released, but binary executables
are available for dynamic analysis. The purpose of this research is to
assess the robustness of software applications to failing system resources
such as memory allocation functions and system I/O functions. The tool
gives an analyst the capability to artificially simulate stressful conditions
(e.g., complete memory utilization) that a program may experience during
its lifetime using simple toggle functions.
Approach
Given the constraint of working with binary executables
without resorting to decompilation techniques, the approach in this research
project has been to instrument interfaces between the application program
under analysis and the shared libraries within the operating system that
the application uses. The approach is to ``wrap'' a binary executable
with an instrumentation layer such that all interactions between an application
and the operating system can be captured, observed, perturbed, and questioned.
Windows NT applications import hundreds of system functions
from shared libraries called Dynamically Linked Libraries (DLLs). As implied
by their name, these libraries are linked during runtime. System DLLs
make a good candidate for studying the effect of system failures on applications
because they typically contain the core functions within the operating
system that applications require. As such, they can be a single point
of failure in a system. If the core OS functions fail, the programs that
use them may fail in turn. For this reason, selectively simulating failures
of operating system resources (such as memory allocation/deallocation,
file system operations, and other system I/O operations) can identify
how robust, or conversely, how vulnerable, an application is to failing
system resources. Studying these failure modes is important in critical
applications where system resources may be unavailable during peak periods
when they are most essential.
Figure 1: Wrapping Executable Binaries
Wrapping Binary Executables
The approach taken to simulate failed system resources
is to wrap binary executables with an instrumentation layer that simulates
system failures. Figure 1 illustrates how program executables are wrapped.
The application's Import Address Table (IAT), which is used to look up
the address of imported DLL functions, is modified for functions that
are wrapped to point to the wrapper DLL. For instance, in Figure 1, functions
S1 and S3 are wrapped by modifying the IAT of the application. When functions
S1 and S3 are called by the application, the wrapper DLL is called instead.
The wrapper DLL, in turn, executes, providing the ability to modify, perturb,
question or simply log the request to the target DLL.
When calling the target DLL function, the wrapper DLL
looks up the address of the target DLL function in its IAT, then passes
the request on to the target DLL. After executing the request, the results,
if any, are returned back through the wrapper DLL to the application that
made the request. The wrapper DLL has the opportunity, again, to modify
or question the returned data from the system DLL. It is at this point
that system calls can be modified to simulate anomalous or failed behavior
from the system. An alternative is not to pass the system call from the
application to the system DLL, but rather simply return a failure condition
back to the requesting program. Note that in Figure 1 when function S2
is called, it is unadulterated by the wrapper.
Figure 2: Failure Simulation Tool
Failure Simulation Tool
The prototype tool graphical interface shown in Figure
2 is an implementation of the wrapping procedure shown in Figure 1. The
tool provides the ability to instrument as many of the interfaces from
an application to the OS as desired. The window shows the memory functions
that can be instrumented with failure or success functions. Other system
functions are available for instrumentation via the System tab shown in
the window in Figure 2.
The success/failure functions can be toggled on or off
at any point during execution of the program. For instance, the GlobalReAlloc
and LocallAlloc functions are both shown to be toggled for failure. A
Success wrapping function indicates that calls are passed through without
modification. The Success and Failure columns show the number of times
the calls for a particular function are made under the success or failure
condition. For example, the LocalAlloc function was toggled to Failure
at some point during the testing, after which six successive calls to
LocalAlloc were failed via the instrumentation wrapper. The log of the
success/failures for each call is recorded during testing in the window
on the right-hand side of the interface, as well as to a log file.
Summary and Future Directions
This brief paper has provided an overview of an approach
and tool for simulating system failures for COTS application programs.
The approach is to wrap the application program binaries with an instrumentation
layer that can selectively fail particular system calls. The effect of
these failures can be observed to study the robustness of applications
under anomalous or stressful system conditions.
A prototype tool has been implemented that allows selective
failure of system resources on-the-fly during testing. The future direction
of this research will be to add increased capability for failing several
types of system resources. The tool will be used to study the effects
of system failures on critical applications.
References
[Bin 98] M. Binderberger. "Re:
Navy turns to off-the-shelf PCs to power ships," RISKS Digest,
19(76), May 25, 1998.
[Sla 98] G. Slabodkin. "Software
Glitches Leave Navy Smart Ship Dead in the Water," GCN Network,
Available online:
www.gcn.com/gcn/1998/July13/cov2.htm.
This
work is sponsored by the Air Force Research Laboratory and the Defense Advanced
Research Projects Agency (DARPA) under Contract F30602-97-C-0117. |
|