Problems with Software Reliability Composition
-
Dave Mason and Denise Woit
-
-
Ryerson Polytechnic University
-
Toronto, Canada M5B 2K3
-
Abstract
Progress is being made toward being able to calculate software system
reliability from the reliability of the components and information about
the structure of their interactions.
This paper enumerates the outstanding problems and provides solutions,
or pointers to solutions, for each.
Introduction
Software system reliability estimates are typically based upon data collected
while testing the system as a whole [Mus97]. However, there is growing interest in estimating
system reliability from the reliabilities of its constituent components.
This technique is both pragmatically appealing, and supportive of the
treatment of software development as an engineering discipline.
Pragmatically, the technique has potential for increasing the cost-effectiveness
of reliability estimation and encouraging code reuse by creating a market
for components with certified reliabilities. It also supports the transformation
of software engineering into a discipline more like that of traditional
engineering; calculation of overall reliability from component reliability
is routine in other engineering disciplines, and it is not unreasonable
to consider this approach for software [Ber97].
In the hardware realm, Markov-based models are commonly used to calculate
system reliability from component reliabilities; this approach is preferred
because of its cost-effectiveness. Because the underlying mathematical
models for such calculations assume component independence, hardware components
are designed to be as independent as possible; any remaining dependencies
are factored into the models. [Lew96,Lyu96]
Unfortunately, the hardware models of reliability composition are considered
by many to be inapplicable in the software realm because software components
tend to violate the component independence assumption of the basic model.
It is widely considered impossible or intractable to design software components
to meet this requirement. [Lyu96]
We are constructing design rules that allow the development of software
components with the necessary independence, and with interaction properties
that parallel those of physical systems, so that they are amenable to
analysis with Markov models. The use of functional and message-passing
paradigms facilitates the construction of these highly independent components.
We show that application of our rules can result in systems which do not
violate the underlying assumptions of the typical reliability composition
models.
Outstanding Problems
There are four problems that need to be addressed before software reliabilities
can be composed, as is common practice in hardware systems. While some
people seem to believe that some of these problems are insurmountable,
we believe we are making good progress toward solving them.
Invokes versus Uses
The first problem is structural: a Markov model has a set of states;
the transitions between them are final transitions, not calls with subsequent
returns. Thus, in a Markov model, a component ``invokes'' another component.
Unfortunately, in the traditional software structure model, a component
``uses'' another component (call and subsequent return). To be applicable
to the Markov model, this structure must be converted so that each component
``invokes'' other components. The difference between ``invokes'' and ``uses''
was first noted in [Par74].
One approach to transform ``uses'' to ``invokes'' is to split modules
at the points that they call other modules. The fragment before the split
becomes one component, and the fragment after the split becomes another
component. Additionally, if such a split is in a loop, additional component
splits must be performed at the top and end of the loop (and similar transformations
must be performed for conditional statements). This program transformation
is a common technique used in compilers for functional programming languages.
The resulting form is called Continuation Passing Style (CPS) because
the continuation point for a function call becomes an explicit parameter
to that function. CPS transformations may be performed manually, or automatically
using appropriate transformation utilities [WM98].
Another approach is to simply build systems so that inter-component
``uses'' are not present. With care, a message-passing programming system
can be built with the invoke characteristics.
The only serious complication with these techniques is that a component
reliability number must be assigned to each fragment, regardless of whether
it was originally the start of a module or a continuation. This may make
it difficult to develop operational profiles for each fragment.
State Independence
In order to be able to combine component reliabilities, the implementation
details of a correct module cannot affect the correctness of another module
in the system. In effect, this means that no global state can be affected,
except as required in the specification. For these purposes, global state
includes the contents and positions of files and I/O devices, in addition
to the more obvious global variables.
Programming in a (mostly-)functional style is one way to achieve this,
but a similar result could be achieved in an imperative style, with sufficient
tool support to track the use of global state throughout the components
of a system.
First Order Markov Models
If a given module is ``used'' from several places in the program, a straight-forward
conversion to an ``invokes'' model will cause the intersection of unrelated
paths from the model, with the result that path artifacts will be created,
which may have significant impact on the calculation of system reliability.
This problem can be resolved by node cloning in the control flow graph.
This means that new nodes will be created corresponding to distinct calls
to a function from various points in the graph. This resolves the problem
but can lead to node explosion - particularly if there are mutual (non-tail)
recursive calls among 2 or more modules (which fortunately seems
fairly rare in practice). This can also lead to operational profile problems.
Operational Profiles
Reliability numbers are generally considered believable only in the context
of an operational profile. Several of the solutions above increase the
difficulty of identifying a viable operational profile. It is important
to note that for non safety-critical applications, reasonable variations
in the accuracy of the operational profile effect only small changes in
the final estimate of system reliability, as was shown in [Mus94].
For example, an operational profile that is ``off'' by 10 percent is reported
to cause the final reliability estimate to differ by only 2-3 percent.
We have encountered similar empirical results. In non safety-critical
applications, this sensitivity level is acceptable. Thus, we believe that
system reliability estimates may not be overly sensitive to our additional
operational profile constraints. This belief is supported by some preliminary
results from our experiments.
There exist many examples of systems for which operational profiles have
been used with good results [Mus97,Lyu96] and work continues in this area. Advances in operational
profile specification will have positive implications for all of Software
Reliability Engineering and will mitigate some of the problems associated
with reliability composition.
Conclusions
Although more work remains to be done, we are making progress toward
the routine composition of component reliability to calculate system reliability.
References
- [Ber97] L. Bernstein. Software dynamics: Planning
for the next century.
In Proc. 10th Intl. Software Quality Week, San Francisco, U.S.A.,
May 1997.
-
- [Lew96] E. Lewis. Introduction to Reliability
Engineering, 2nd edition.
John Wiley, Toronto, 1996.
-
- [Lyu96] M. Lyu, editor. Software Reliability
Engineering. McGraw-Hill, New York, 1996.
-
- [Mus94] J. Musa. Sensitivity of field failure
intensity to operational profile errors.
In Proc. 5th Intl. Symposium on Software Reliability Engineering,
Nov. 1994.
-
- [Mus97] J. Musa. Applying operational profiles
in testing.
In Proc. 10th Intl. Software Quality Week, San Francisco, U.S.A.,
May 1997.
-
- [Par74] D. Parnas. On a ``Buzzword'': Hierarchical
structure. In Proc. IFIP Congress 74. North-Holland Publishing
Co., 1974.
-
- [WM98] D. Woit and D. Mason. Software system reliability
from component reliability. In Proc. 9th Software Reliability Engineering
Workshop, Ottawa, Canada, July 1998.
-
The original of this page may be found at http://sarg.ryerson.ca/sarg/papers/1998.08.15-issre-fa
|