ODC - Orthogonal Defect Classification

next up previous
Next: Data Up: Measurement of Failure Previous: Introduction

Service Process

In the world of commercial software, we all recognize that only a fraction of the failures experienced by customers are due to software faults. Failure associated with software can also result from operator error, environment, and sometimes hardware failures that appear to be software failures. The goal of this paper is to quantify the failures caused by software faults, and software faults alone. To isolate failures only due to software is not a simple task. Failure logs, even when available, do not always contain the required data. They are also suspected by developers because of the poor quality of data, incompleteness and lack of detail. The problem is primarily because of the difficulty in capturing failure events, compounded by systems poorly architected to capture software data. This is unfortunately true even with the best of systems, such as MVS, UNIX, VMS etc., that are better documented in this regard. Another significant problem with failure log data is its lack of availability across a large fraction of the customer base. Collection of such data is tedious, sometimes impossible, and often very hard to aggregate across different generations of logs [Buc93].

   figure59
Figure 1: Fault and Failure Process

Using Service Process data holds much promise. There are two independent reasons that motivate this direction. First, the service process reaches out to the entire customer base. Second, this data is actively used to manage the business and product development. This latter point is an important link to recognize and exploit. Given that the failures in the field are caused by faults in the software which are ultimately fixed by development, the quality of the data is eminently better. Development organizations tend to carefully account for faults in the product, but may not care too much about tracking the failure process. On the other hand, the Service organizations do track the failures and repairs closely. Between the two, often disparate sources of data, there is a significant opportunity for the measurement of commercial software failure rates. Since failures are reported across the entire customer base of the product, there is far greater coverage of failure events experienced on the product. Although there would be substantial under-reporting, it is still a much more representative event space than carefully studying a small subset of machines or customers accounts. To reasonably represent the entire customer set, it is necessary to use a large sample (at least several hundreds), if not the entire population. Even when the data is available, the challenge is to be able to use it, which is no simple task.




next up previous
Next: Data Up: Measurement of Failure Previous: Introduction

rchill
Wed Mar 31 12:29:44 EST 1999