![]() |
||
ODC - Orthogonal Defect Classification |
||
|
|
Next: Failure/Fault Relationship Up: Service Process Previous: Service Process
DataTo use this data, one has to understand the subprocesses of faults and failures, to extract the right measurements and devise appropriate filter mechanisms for the data. Figure 1 illustrates a state transition diagram showing the key events relevant to us. The reason we describe both the failure process and the fault process is to provide a clear understanding of the service process and gain insight on data available. When a customer has a problem with a software product, they can call the customer support service. This facility is available for all kinds of problems customers may face. The problems can include failures due to software, requests for how-to information, installation etc. etc. Most calls do not relate to software failure. There are however, a small fraction of calls that are software failures, resulting from defect oriented problems, which is the focus of this paper. When a failure is reported and identified as a potential code related failure, a problem record is created and an investigation begins. The investigation searches the failure data base to see whether the problem is known and a fix is readily available. If the investigation yields an immediate solution, namely the rediscovery of a known fault, the fix is dispatched and it terminates the failure tracking cycle. Sometimes, after investigation, it is determined that the failure corresponds to a new fault being reported. In this case, a new fault report is initiated, which in IBM parlance is called the Authorized Program Analysis Report, (APAR). The failure tracking process is not yet completed and waits for the fault process to complete. A change team fixes the fault and makes the fix available to the service team, and also documents its characteristics so that future failures due to this fault can be easily recognized. Change teams sometimes exist within the development organization, following their practices of tracking defects. A more detailed description of the service process can be seen in [CRGR93] and [SC91]. Not every fault is experienced by each customer; in fact there are several faults that are reported only once. This is due to the peculiarities of an environment that triggers a failure. Customers have access to all fixes for known faults. Usually at periodic intervals, customers upgrade their software to include all known fixes. On the other hand, several customers do not want to fix what is not broken, and are very selective in applying fixes. Some faults, however, are considered highly pervasive by the service team, which urges customers to install these fixes as soon as possible.
Figure 2 shows the impact of different faults and failures as they would appear on a time line across the entire customer base. Failures and Faults are shown on the time line, on the upper and lower sides respectively. A vertical bar above the line represents a failure, and the one below, a fault. The very first time a failure (IBM term, Problem) is reported it would cause the creation of new fault record (APAR) to address the failure. In the figure, P1 represents the failure which causes the identification of new fault A1. Subsequently there can be multiple occurrences of the failure P1 which would not require creation of an APAR. Instead, the repair for the failure P1 is available to the customers and soon the investigation figures out that the failure corresponds to the fault A1. Failures P1 and P2 re-occur in the customer base at different customer sites and are shown with repeated occurrences along the time line.
Next: Failure/Fault Relationship Up: Service Process Previous: Service Process rchill Wed Mar 31 12:29:44 EST 1999 |
|