ODC - Orthogonal Defect Classification

next up previous
Next: 4. Analysis by Defect Up: 3. Analysis by Failure Previous: 3. Analysis by Failure

Identification of Sub-populations

We separate defects by their failure symptom and generate both the observed and fitted reliability growth curve for each of the 15 symptoms. A qualitative inspection of the 15 growth curves, one for each symptom, (not shown) revealed considerable differences among some of them. This motivated the identification of symptom-groups such that each symptom-group has a different characteristic reliability growth from the other. Our interest is in quantifying the differences in the shape of the growth curve for the various symptoms. Therefore, we fit the inflection S-shaped growth curve using non-linear regression, specifically the Marquardt method. From the fit, we estimate parameters for N, phi and psi and their asymptotic 95% confidence intervals. Note that N corresponds to the estimated number of defects for that specific symptom. The parameters phi and psi characterize the growth curve and can be used to quantify the difference in growth experienced by the fifteen symptoms. We also calculate P sub T for each symptom, which measures the percentage of the defects detected until time T. Identifying the sub-populations of interest is best done by plotting the parameters phi, psi and P sub T in 3 dimensions. We use r instead of psi since r relates to the independence of defects. Figure 2. shows such a plot. The horizontal plane contains the parameters r, phi and the vertical contains P sub T. A lower value of r implies greater inflection and correspondingly more dependent defects. The parameter phi corresponds to the rate of detection. A lower phi corresponds to harder to detect errors and a higher phi, easier. Each arrow in the figure corresponds to defects with one of the 15 symptoms. The height of the arrow shows the value of its P sub T and correspondingly what fraction of the defects were detected at time T. The position on the r - phi plane indicates, in relative terms, whether the defects are dependent and how hard it is to detect them. Defects that have a low r and phi have relatively more dependent defects but are also easier to find. Defects with a high r and low phi have relatively more independent defects and are harder to find, etc. Thus, partitions of the r - phi plane correspond to defects that demonstrate different growth characteristics, leading to what we want to identify.

figure71

Note that the purpose is to identify sub-populations of defects that demonstrate different growth curves - and symptom is only a means to that end. We cluster symptoms in Figure 2. into symptom-groups such that each symptom group identifies a sub-population of defects that experiences similar reliability growth characteristics. The circles drawn around the arrows identify clusters of symptoms with phi, r and P sub T close to each other. Four such symptom-groups (groups) have been identified in the figure. The choice of the first three groups as clusters is more obvious than the fourth. Each of the groups has typically between 20-30% of the defects. We now divide the entire population of defects into the four sub-populations (symptom-groups) as identified by the clusters of symptoms in Figure 2. For each symptom-group we generate the observed and fitted reliability growth curve, shown in Figure 3. The fit with four sub-populations will be better, since it is equivalent to fitting data with more parameters. Therefore, we do not provide the sum of squares of error for the four sub-populations to compare with the overall, since, such a comparison is not meaningful. The parameters of the fitted model namely, phi, psi and N are extracted and shown for each symptom-group in Table 2. Qualitative differences in the symptom-groups are more apparent from their growth curves whereas the extent of the differences is better seen from their parameters. Notice that parameters phi and psi have a reasonable separation among the groups. The clustering approach is a reasonably good method to identify sub-populations, provided it is done with an understanding of the underlying data. We have deliberately avoided the use of complicated analysis since verifying normality assumptions is often impossible in such data. Indeed, we strive to keep the analysis simple without straying far from the physical interpretation of data. Group 1 contains defects with 6 symptoms that have a relatively low r and high phi value. This should correspond to defects that are dependent but easy to find. The observed and fitted growth curves in Figure 3a. show this: it has the most inflection; it has the highest P sub T value. Group 2 contains defects that have a slightly higher r value and lower phi. This Group still has the S shape, but is much less inflected than Group 1. The lower phi corresponds to defects that are harder to find, and is well reflected in the lower P sub T value. Group 3 is unusual in that it is just one symptom and has a relatively low phi and low r value. This corresponds to defects that are more independent and also harder to find (reflected with a P sub T value much lower than Group 2 which has almost similar r value). The growth curves of group 2 and group 3 have similar inflection except that a significant number of defects in Group 3 have late discovery. Group 4 is a collection of three symptom groups that do not have their r, phi values as close each other as the other groups do. This group is characterized by a low phi and high r, corresponding to relatively independent, and harder to detect defects. This sub-population is also the set of defects for which the S-shaped curve is not the best model. There is almost a multi-stage S-shape during the testing phase. We have put these defects in one group since they stand out from the other three groups. It is interesting that 30% of the defects, coming from 3 specific symptoms tend to be different from the rest of the population. In this section we used a clustering technique to identify sub-populations using the parameters from inflection S-shaped growth model. We identified four sub-populations using the symptoms and called them symptom groups. The symptom groups are different in terms of having relatively dependent or independent defects and demonstrating either a slow or quick growth. In the next sections we investigate the cause of the defects (types) and in the subsequent section relate the defect types to the symptom-groups identified in this section.

G PARM  ESTIMATE    STD      95% CONFIDENCE R                  ERROR        INTERVAL P                            LOWER     UPPER   1 N     148.534    0.9282   146.700   150.368 1 PHI     0.049    0.0008     0.047     0.050 1 PSI  3164.837  436.6523  2302.137  4027.537   2 N     257.067   1.09645   254.907   259.226 2 PHI     0.031   0.00033     0.030     0.031 2 PSI   219.710  11.15628   197.741   241.678   3 N     178.490   2.70529   173.147   183.832 3 PHI     0.026   0.00061     0.025     0.027 3 PSI   254.467  24.62409   205.837   303.098   4 N     233.039  12.91579   207.562   258.516 4 PHI     0.019   0.00123     0.017     0.022 4 PSI    71.088  11.06830    49.255    92.920   Table 2. Parameter Estimates


next up previous
Next: 4. Analysis by Defect Up: 3. Analysis by Failure Previous: 3. Analysis by Failure

rchill
Thu Apr 1 16:01:58 EST 1999