Francesca Saglietti1
Institute for Safety Technology (ISTec) Garching (D)
saf@istecmuc.grs.de
In recent times considerable effort is being invested by the
software engineering community in various attempts to formalise
the decision process underlying licensing procedures for safety-critical
embedded systems based on software-intensive components. As for
more conventional technologies based on mechanical or electrical
devices, the licensing process requires first an evaluation of
the risk involved by automation, as compared with the inherent
risks of the uncontrolled application. Main outcome of this analysis
are the identification and classification of failure criticality;
based on this insight minimum reliability demands are determined
and required to be demonstrated for the purpose of certification.
Product Reliability
According to most international standards the quantitative
notion of software reliability refers to the probability of operational
survival; i.e., to the probability of correct performance under
the expected application-specific operational demand profile.
The source of randomness justifying a probabilistic approach
in spite of the deterministic nature of software faults lies
in the input selection, which is subject to unpredictable physical
state transitions of the underlying technical process. Thus,
a quantitative evaluation of software reliability is achievable
by
operational evidence: the execution of a large number
of independent and representative scenarios is followed by an
estimation of maximum failure probability based on statistical
sampling theory. Assuming an accurate performance of this testing
phase, it provides the rationale for deciding whether:
- the system fulfils ultrahigh reliability (F), or
- the system does not fulfil ultrahigh reliability (N).
The amount of test cases required to demonstrate ultrahigh
reliability targets, and the difficulties in anticipating the
operational profile often pose serious problems in carrying out
a statistically significant reliability evaluation for safety-critical
software.
Expert Judgement
For this reason, and in order to support early fault detection
by transparent and sound design methods, independent assessors
usually include in their quantitative judgement also qualitative
aspects based on
non-operational evidence: refers to life-cycle phases
preceding operation (a. o. development process, safety culture,
documentation, resources invested, inspections, static analysis,
non-operational tests).
The crucial problem in engineering judgement is the integration
of heterogeneous (operational and non-operational) evidence into
a single probabilistic statement. The validity of such an integration
mainly depends on the available expertise, which may be of
frequentistic nature: factual knowledge of process
impact on product quality is available on the basis of large
populations of identical or comparable (typically physical) devices
developed by standardised manufacturing procedures; or rather
of
subjective nature: personal expectation of process
impact on product quality on the basis of individual observations
of "similar" development projects.
The result of the assessment process can be two-fold:
- acceptance of the software-based system (A), or
- rejection of the software-based system (R).
In order to provide a formal framework for the integration
of inhomogeneous evidence, in several technical fields diagnosis
activities are supported by
Bayesian Belief Networks (BBNs): the assessment process
is modelled by a network reflecting the updating of prior probabilities
based on historical experience in the light of present factual
knowledge.
It is not the intention of this article to detract from the
qualitative merit of BBNs in providing a transparent assessment
structure (see [Dah97], [Del97],
[Nei96]). It rather aims at an analysis
of the statistical significance of BBN quantitative outcome
in the special case of ultrahigh software reliability assessment.
The main limit of engineering judgement pointed out here concerns
stable development: the determination of sound a priori
values based on historical, non-operational evidence assumes
a homogeneous expertise in a stable programming environment;
the variance in the development process limits the validity of
subjective judgement. In particular, assuming project homogeneity
implies stability in reliability targets to be achieved and demonstrated.
At present, this assumption is not generally fulfilled; on the
contrary, expertise with ultrahigh reliable systems usually represents
a minor part out of the overall set of applications considered:
P(F) << P(N)
In fact, most subjective opinions originate from lessons learnt
in the light of past faults rather than in the light of successful
experience. In other words, reasoning on process effectiveness
relies more often on falsification than on verification. Considering
five safety integrity levels (like IEC 65A) the frequency of
applications experienced is likely to follow a normal distribution,
where the highest reliability class may just represent one out
of ten cases:
P(F) / [P(F)+ P(N)] = ca. 0.1.
Expert Reliability. Safety assessment based on subjective
judgement evidently relies on expert quality. Due to the above,
probabilities for justified acceptance / rejection are not likely
to be better than:
P (A under condition F) = P(R under condition N) =
ca. 0.9.
Certification Reliability. For the purpose of licensing,
more crucial than expert reliability is the probability of successful
certification (S), i.e. of ultrahigh system reliability assuming
acceptance. In analogy to the Harvard medical study reported
in [Gra98], applying Bayes theorem
on the examples considered above yields a surprisingly low probability
of successful certification:
P(A) = P(A under condition N) P(N) + P(A under condition
F) P(F) = ca. 0.18
P(S) = P(F under condition A) = P(A under condition
F) P(F) / P(A) = ca. 0.5.
Conclusion. This article aims at demonstrating a substantial
weakness in the quantitative assessment of ultrahigh software
reliability based on subjective judgement. To begin with, todays
still unstable software engineering limits the extrapolation
of past experience on new projects; this severely restricts expert
reliability. Moreover, a high variance in the reliability degrees
achieved and demonstrated so far contributes to a dramatic decrease
in the probability of successful certification. However, a transparent
description of the assessment process undoubtedly offers a qualitative
support, which in future might become extendible to include sound
quantification.
References
[Dah97] G. Dahll: "Safety Assessment
of Software Based Systems", Safecomp97, Springer-Verlag
1997
[Del97] K. Delic, F. Mazzanti, L. Strigini:
"Formalising Engineering Judgement on Software Dependability
via Belief Networks", DCCA97, Springer-Verlag
1997
[Gra98] T. Grams: "Bedienfehler
und ihre Ursachen", Automatisierungstechnik Praxis 40,
Nr. 3, 4, Oldenbourg-Verlag 1998
[Nei96] M. Neil, B. Littlewood, N. Fenton:
"Applying Bayesian Belief Networks to System Dependability
Assesment", Safety Critical Systems Club Symposium,
Springer-Verlag 1996
Author contact: Institute for
Safety Technology (ISTec) GmbH, Forschungsgelände, 85748
Garching, Germany, phone: 49 89 32004 - 539 fax: - 300 e-mail:
saf@istecmuc.grs.de. |