RFDAT: A Residual Failure Distribution
Analysis Tool
-
- Soo-jong Lee 1
- Switching System Department
- Switching&Transmission Technology Lab.
- Electronics and Telecommunications Research Institute
(ETRI)
A very large electronic system like ATM switching system should
strictly satisfy the function specification required in quality
and the reliability described in the user requirements from the
early development stage in order to maintain reliability and
stability in service fields. The reason is that it is very difficult
to repair problems and recover the original condition of the
system once the technology with complexity and variety is applied
to the service fields. As the ATM switching system is an integrated
system, which is developed by many expert groups, it is possible
to have many factors which can cause errors. For example, partitions
of development scope, subjective elements in the analysis of
requirements, design and coding, etc. may have defects.
A Residual Failure Distribution Analysis Tool (RFDAT) has
been implemented as a failure analysis tool for the ATM switching
system under the development phase, which is a core node in building
the B-ISDN. The long development cycles of this system raise
some problems, i.e. âhow much degree of reliability
is.â and âwhen we can release
it.â Using RFDAT, real-time analysis of group
failure becomes easy, and the release time of related system
or software version can be estimated. The following technical
problems can be solved with the RFDAT:
Debugging Velocity. In the development phase, many
areas of failures detected irregularly by several test persons
would be commenced promptly in debugging by developer related.
Whole and partial debugging velocity is essential to increase
system reliability.
Whole distribution of residual failure in past, present
and future. As handling the debugging velocity residual failures
in future can be estimated. So whole cycle from the first detection
to the last debug of failures can be analyzed.
Partial distribution of residual failure in specified group.
Using the failure group codes, specific failure area as well
as whole can be analyzed with same methods.
RFDAT is a framework for system quality assurance and evolution.
It is composed of several parts, such as:
Part for Weibull distribution: It analyzes the debugging
rate in terms of the period taken for debugging and applies the
results of that analysis to Weibull distribution. Parameters
of Weibull distribution are calculated automatically, and then
debugging model is made at the same time.
Part for trend table: The trend table is necessary
for residual failures in future. The trend data at each point
of period from the present to the future could be obtained by
multiplying total residual failure data of present with each
rate at the table. Failures not debugged up to now can be distributed
through the trend table with weight.
Part for integrated distribution figure. The integrated
residual failure distribution is composed of two parts: the actual
residual failure data from the first detection to the present,
and the trend residual failure data from the present to the future.
The residual failures at some point of period is calculated by
excluding debugged failures from the detected failures up to
that point respectively.
Part for failure area code: There are many failure
field group, i.e. software, hardware, system, software environment,
hardware environment, system environment, operating system, operation
and maintenance, control system, data base, and so on. As these
failure field groups are written in the failure area code, it
analyzes each item, i.e. parameters of Weibull model, debugging
velocity, trend table, goodness-of-fit test, integrated residual
figure, and so on.
The above parts are shown in the context of the overall structure
of RFDAT in Figure 1. The rests of three parts i.e. detection
information, debugging information, and failure area code are
calculated automatically. Furthermore, it verifies itself that
the model explains the actual data well by means of goodness-of-fit
test with 95% confidence level.
The analysis of residual failure data using RFDAT has been
successfully performed. It analyzed the major failures consisting
of 315 items which were collected during the test of ATM switching
system version 3.2, 3.3, 3.4, 4.3, under the development phase
from the early 1996 up to recently. <Figure 2> represents
the curve of the synthesized residual failures, which integrates
actual data from the first period of detection up to present
(for example, up to week 25), and estimates in the future.
Based on the integrated residual failure data distribution
at each period from the first detection to the debugging of failures,
we can analyze the fluctuation trend of the residual failures
continuously and also utilize it in deciding the release time
of related systems and software versions.
References
[1] Amrit L. Goel, Member, IEEE, "Software Reliability
Models: Assumptions, Limitations, and Applicability", IEEE
Transactions on Software Engineering, Vol. Se-11, No. 12, December
1985.
[2] Steven E. Rigdon and Asit P. Basu, "The Power Law
Process: A Model for The Reliability of Repairable Systems",
Journal of Quality Technology, Vol. 21, No. 4, October 1989.
[3] Soo-jong Lee, "An Analysis and Modeling for the Software
Failures of the ATM Switching System under the Development Phase",
APCC'97, Sydney, Australia, ISBN 0 909394 44 X, pp.965-969, December
1997.
---------------------------------------------------------------------------------------------------------------------
1. Author contact: Switching System Department,
Switching&Transmission Technology Lab., Electronics and Telecommunications
Research Institute (ETRI), 161 Kajong-Dong, Yusong-Gu, Taejon,
305-350, KOREA, Phone: +82-42-860-5584, FAX: +82-42-860-5410,
E-mail: sjlee@nice.etri.re.kr |