Fast Abstracts Archives . .

FastAbstracts


WHAT IS a
FastAbstract

The History

Archives of
FastAbstracts

ISSRE 2003
ISSRE 2002
ISSRE 2001
ISSRE 2000
ISSRE 1999
ISSRE 1998
FTCS 1999
FTCS 1998



 

 

 

 

 

Enhancing Survivability of Critical Information Systems

John C. Knight, Kevin Sullivan, Xing Du,
Chenxi Wang, Matt Elder, Ray W. Lubinsky
Department of Computer Science, University of Virginia
Charlottesville, VA 22903-2442
{knight, sullivan, xd2a, cw2e, mce7e, rwl}@cs.virginia.edu

John McHugh
Department of Computer Science, Portland State University
Portland, OR 97201
mchugh@cs.pdx.edu

 

Many large information systems have evolved to a point where organizations rely heavily upon them. In some cases, such systems are so widespread and so important that the normal activities of society depend upon their continued operations. We refer to these systems as critical information systems. Examples of them are banking and finance, transportation, and medical service [1]. There is a need to improve the survivability of critical information systems given the increasing dependence on them, the serious consequences of their failure, and their demonstrated fragility and vulnerability. The survivability of a system is defined as the ability of the system to continue to provide service (possibly degraded) when various changes occur in the operating environment.

Most critical information systems are legacy software and/or composed of Commercial-Off-The-Shelf (COTS) components. They are large-scale distributed systems and consist of hundreds of computers geographically located nationally or even globally based on wide area networks. Our Guardian project [2] aims at enhancing the survivability of such systems to tolerate failures and co-related malicious attacks while minimizing the modification to the existing systems. The approach we adopt is characterized by:

  • A sensor- and actuator-driven control system framework. To minimize the modification of existing systems and provide flexible and relatively application independent survivability management, a control system framework is adopted (Figure 1). It consists of two major parts: sensor/actuator and control system. Sensors collect failure, attack, performance, and other information about the critical systems, and actuators execute functions to exert control over the critical system to change its operation. The control system obtains the required information from sensors, makes decisions based on the survivability requirements of the system, and notifies the actuators to do them. Sensors and actuators are implemented as shells for the existing systems, requiring minimal modification. A shell is a layer of software that logically surrounds a software artifact and either enforces some useful predicates on the state of system of which the artifact is a component or supplements the functionality of the artifact in some crucial way.













    Figure 1. System framework.

     
  • Survivability Specification. Critical systems are viewed and analyzed from service perspectives, and the services are classified quantitatively based on their inter-dependence and criticality to the whole functions of the system. Failures and attacks are assessed based on their kind, number of simultaneous occurrences, and the places where they occur. The survivability requirements are expressed in a set of predicates which indicate under what sort of failures/attacks which services should be survived.
  • Adaptive and dynamic service survivability management. Survivability management is the process that keeps meeting the survivability requirements in the face of changes in the system. The process is composed of four phases: (1) Change Detection. It detects changes (e.g. failures and attacks) that have happened in the system. (2) Damage Assessment. It assesses the damage caused by the changes. (3) System Adjustment. It isolates the failed services, activates queuing mechanisms to buffer services requests directed to these failed services, and slows down the whole system processing pace in order to provide sufficient time for the next step to work before the whole system crashes. (4) System Restoration and Adaptation. Based on the damage to the system and applications, and the availability of system resources, the control system decides if the damaged services could be restored or the whole services may be reduced. It determines which functions will be continued to provide, and switches the application from one design configuration to another according to the survivability requirements. Based on the history of changes happened in the system, it changes the design configurations of services adaptively to provide more survivability under a given number of system resources. The adaptation is reflected in another perspective as well: The above four phases may be used adaptively for different services based on the criticality of the failed services.
  • Security issues. Secure control systems and sensors/actuators are a critical issue that we address. A new vulnerability should not be introduced with the presence of the control system framework. The control system adopts a hierarchical distributed structure to avoid single node failure and improve system performance. Authentication is employed to identify control system components and the data transferred between them is secured by encryption/decryption. Control systems are running on separate computers, but the sensor/actuator should reside with the application on the same machine. We are investigating methods to secure the sensor/actuator in such an environment.

There are many research challenges involved, which include, for example:
(1) Generation of control systems from specifications,
(2) Scalability of control system architectures, and
(3) High assurance systems on the Internet.

We are now studying the features of critical information systems, analyzing the feasibility of the approach, proposing a systematic way to use the approach, and applying it in a prototype system. The prototype system emulates the payment system of the US banking system, whose survivability is enhanced by our control system and sensor/actuator framework. Even though its functions are limited, it shows the potential of the approach, and provides a testbed for significant follow- on research.

References

[1] J. C. Knight, M. C. Elder, J. Flinn, and P. Marx, Summaries of Three Critical Infrastructure Applications, Technical Report CS-97-27, Department of Computer Science, University of Virginia, December 1997.

[2] J.C. Knight, R. W. Lubinsky, J. McHugh, and K. J. Sullivan, Architectural Approaches to Information Survivability. Technical Report CS-97-25, Department of Computer Science, University of Virginia, September 1997.


1. This work is supported in part by the Air Force under grants F30602-96-1-0314.