![]() |
||
ODC - Orthogonal Defect Classification |
||
|
|
Next: Deploying ODC at IBM Up: Key Concepts in ODC Previous: Key Concepts in ODC
Distribution change as a measurementThe concept of a classification system, that becomes a measurement is best illustrated with an example. The following example, brings several points together - namely semantic extraction, orthogonal classification and finally the necessary and sufficient conditions for ODC. Figure 3 shows an example from the real world project, a major component of an operating systems product containing several tens of thousand lines of code. This figure has two charts; a growth curve on the top and a distribution on the bottom. The abscissa of the curve is time, indicated in days, spanning the testing portions of development. For the purpose of analysis it is partitioned into four periods, 0, 1, 2 and 3, each 200 days. The Growth curve on top is a plot of the cumulative number of defects against time. If the curve flattens out, it is goodness, since no further defects are uncovered. From the curve, it looks as though that the curve flattens out, but that is an artifact of the stopping of testing and that data beyond the release is not included! The total number of defects found through testing are around 800. Period 3, which was the last six to seven months, uncovered almost half the total defects. Period 3 was mostly systems test, when it desired that the product stabilize and not too many defects be found. However, it was clearly not the case in this case. The reason we cite this example, is because it illustrates some of the difficulties encountered in development and the challenge it provides growth modelling. Somewhere during the middle of period 3, the defect find rates set off an alarm in the development organization. It was evident that more resource is needed for both testing and fixing the bugs, but the count of defects alone do not provide any insight into what might be a smart tactical solution. This is where the knowledge of what is contained in the defects can come to play.
We chose this project to conduct a hind sight pilot, precisely due to the problems it encountered in development and the difficulty that classical growth modelling could not handle. We took defects from each of those periods and categorized them. We categorized them into a very carefully thought through set of attributes and values. One of the attributes is called the defect type, which we describe below. The defect type is merely the meaning of what is changed to fix the problem. The list of defect types used for this pilot study are: function, assignment+checking, interface, timing, documentation. The categories are simple, and are meant to reflect a set of programming tasks, yet general enough that they could apply to errors anywhere between design and the field. The categorization is really, what was intended by the person fixing the bug. A function bug is a capability that the program was supposed to provided either to the user or another program. Whereas, a checking defect is one where the program did not include a check of some condition or did so incorrectly. Inter module communication would be interface, while not holding a lock when one should would be related to timing. The categories are basic programming concepts or practices that are captured in the few distinct values. In actuality, the classification should provide the closed match to one of the identified values, thereby communicating the intent of the change. With this background on classification of the defect type attribute, let us focus on the lower portion of figure 3. The bars, reflect the proportion of function defects in each of the periods, when categorized by the five possible selections above. It shows that in period zero between 10 and 20 percent of the defects were of type function. Period one was higher at 25 percent, and it continued to increase. Periods three and four show it going up from 30 percent to 50 percent. If one had a process where design precedes coding, and testing follows coding, the increasing proportion of function defects, together with the volume increase in defects, seems like it is growing backwards. Indeed, that is precisely the problem, recognized easily by the distribution of the defect types. In fact, if this data were available during the development process, it could not only clearly indicate the problems, but anticipate them, well before the sharp increase in defects in period three. Furthermore, it is also suggestive of the type of process correction that may be beneficial. The rise in functional defects is indicative of a design problems, and one of the options is to start a set of reviews using skills that best understand the design of the product. The post-morteum on the project told us that the project had slipped a few releases, key people had moved on, and it was resurrected with new people, while the requirements had continued to change. What we are discussing here is the use of the semantic information that is buried in the defects, extracted through a simple, but powerful, classification scheme. If someone has the patience and attention span to read all 800 defects and understand their content the same conclusion may have been reached by recognizing the common element on functional deficiency or incorrectness with grew in intensity as the product advanced through testing. This is where it compares to root cause analysis, and is far more effective. The classification can become a measurement and based on an expectation, provide a measure of variance. The semantic nature of the classification, which is tied closely to the programming model provides guidance on what might be opportunities for process correction. The classification scheme has to have certain properties so that it
can indeed provide a measurement and these are called the necessary and
sufficient conditions for ODC [CBC
The figure, shows an expectation of the normalized defect type distributions by phase. Essentially, it says that for this kind of process we would like to see is that most of the function defects are found early and that should go down as the product goes through the process. It would also make sense if the timing and serialization problems only showed up towards the end because that's when the product is on the real hardware. Until then, the product is on a simulator or not the intended hardware. In between these two extremes, the process should weed out the assignment and interface defects. The unit-test probably has a peak in the assignment or one liner type defects and the interface defects tend to dominate when the new code is integrated with the rest and function tested. By the process of the articulation above, we have in essence used the distribution of defects almost like an instrument to measure the progress of the product through the process. In fact, the distribution actually is indicative of how far the product has matured. So, if a product is supposedly in systems test, but the distribution of the defects tend to look similar to an earlier part of the process, and the volumes are not insignificant, then it clearly indicates a problem. In fact, it not only indicates the problem, but suggests what might be the possible reasons - via the offending defect types that are throwing off the distribution from the expected. As it happens, distribution changes are fare more sensitive and can be recognized easier than cumulative measurements, providing a useful tool to the developer. The feedback to a developer is available right after any two stages of development, providing rapid in-process feedback. Since the categories are in the language of a developer they are eminently more actionable. Key to this technique, are the value set used for classification. This is governed by a necessary and sufficient condition - which essentially dictate what would be the right value set for a process. Establishing the right value set is an empirical process, since one is trying to compress the vast range of semantics into a few attribute-value pairs that capture the essence. We can describe where a chair is located in a room, by measuring off two adjacent walls and the ceiling, providing three orthogonal measurements in a three dimensional world. The question is can be place a measurement on a defect, by classifying the fix, such that a collection of them, can tell us via their distribution, where the product belonged in the process space. Essentially, the number of classes that are used to classify the defect should provide enough dimensionality such that it can resolve the process space we are interested in. Furthermore, since, the intent of the process is so that the product matures as it goes through it, the distribution of those defect types should change as the product moves through the process. Mathematically, the former is the sufficient condition and the latter the necessary condition. Since, choosing a classification scheme seems easy, a common error is to develop classification schemes, without much thought to how they are used later. In this case, it should be recognized that the key is to make a measurement out of them, and therefore substantial thought should be placed on the choice of the values of the defect type. They should as far as possible be distinct and independent. Furthermore, the value set should be applicable all through the process phases (consistency), else they cease to be a measurement. Lastly, they should be process independent, but generic the activity (in this case programming). Then, the controlled object can be the process, which when changed, does not impact the measurement system. Details of this are furthered explained in the ODC paper.
Next: Deploying ODC at IBM Up: Key Concepts in ODC Previous: Key Concepts in ODC |
|