ODC - Orthogonal Defect Classification

next up previous
Next: The Defect Type Attribute Up: Orthogonal Defect Classification Previous: Introduction

Orthogonal Defect Classification

The difficulty in developing methods and techniques to bridge the gap between theory and practice in in-process measurements stems from a very fundamental issue - the lack of well defined cause-effect relationships that are validated over time. Without a good sense of cause and effect, it is very hard to develop methods that provide good feedback to a developer. Yet, until recently methods to identify such existence and crisp techniques to measure it were not developed.

A recent study embarked on exploring the existence of relationships between the semantics of defects and their net result on the software development process [15]. The choice of semantics of defects was intentional since, it could become a vehicle that provides a measure of the state-variables for a development process. The study showed that when defects were uniquely categorized by a set of defect types, representing the semantics of the fix, it was possible to relate changes in the shape of the reliability growth curve to defects of a specific type. The defect types could be associated with the activities of the different stages of development. Thus, defects of a specific type were due to some cause in the process and the shape of the reliability growth curve represented an effect on the process. In the study, sub-groups that had larger than average proportion of initialization defects yielded growth curves that were very inflected - confirming the theory that errors early in the code path (viz. initialization) hide other defects causing the growth curve to inflect [9]. Had such in-process measurements on defect type been available, developers could compensate for problems by altering test strategy. Similarly, a substantial number of function defects prompted questioning of the design process. In hindsight, it was learned that the design process had much to be desired. The study demonstrated that a simple classification scheme could reveal insight into process problems faced during development. It was subsequently recognized that a semantic classification could be exploited to provide in-process feedback. The study demonstrated the existence of a measurable cause-effect relationship that could open the doors to a host of viable alternatives.

Orthogonal Defect Classification (ODC) essentially means that we categorize a defect into classes that collectively point to the part of the process which needs attention, much like characterizing a point in a Cartesian system of orthogonal axes by its (x, y, z) coordinates. In the software development process although activities are broadly divided into design, code, and test, each organization can have its variations. It is also the case that the process stages in several instances may overlap while different releases may be developed in parallel. Process stages can be carried out by different people and sometimes different organizations. Therefore, for classification to be widely applicable, the classification scheme must have consistency between the stages. Without consistency it is almost impossible to look at trends across stages. Ideally, the classification should also be quite independent of the specifics of a product or organization. If the classification is both consistent across phases and independent of the product, it tends to be fairly process invariant and can eventually yield relationships and models that are very useful. Thus, a good measurement system which allows learning from experience and provides a means of communicating experiences between projects has at least three requirements:

1.0

  • orthogonality,
  • consistency across phases, and
  • uniformity across products.
1.0

One of the pitfalls in classifying defects is that it is a human process, and is subject to the usual problems of human error, confusion, and a general distaste if the use of the data is not well understood. However, each of these concerns can be handled if the classification process is simple, with little room for confusion or possibility of mistakes, and if the data can be easily interpreted. If the number of classes is small, there is a greater chance that the human mind can accurately resolve between them. Having a small set to choose from makes classification easier and less error prone. When orthogonal, the choices should also be uniquely identified and easily classified.

 

Necessary Condition
There exists a semantic classification of defects, from a product, such that the defect classes can be related to the process which can explain the progress of the product through this process.

If the goal is to explain the progress of a product through the process, the simple case of asking the programmer fixing the defect, ``where are the problems in this product?'' is the degenerate solution to the problem. This question is implied by classifications such as ``where injected?'' that rely on the intuition of the programmer to directly map defects to process stages. However, practitioners are quick to point out that the answer to the above question requires stepping back from the process; conjecturing can vary dramatically in both the accuracy and the validity of their answer. Such direct classification schemes, by the nature of their assumptions, qualify as good opinion surveys, but do not constitute a measurement on the process.

The above goal can be achieved by capturing the details of a defect fix in a semantic classification that is subsequently related to the process. An example of such semantic classification is ``defect type'' which captures the meaning of the fix. Since defect type does not directly translate into ``where are the problems in this product?'', it needs to be mapped to the process. This mapping provides the relation between defect types and the process, which enables answering the above question. Thus, semantic classification provides measurements on the process that can yield an assessment of the progress of a product through the process.

Semantic classification is likely to be accurate since it is tied to the work just completed. It is akin to measurements of events in the process, as opposed to opinions on the process. There is an important advantage in the semantic classification of a defect, such as defect type, over an opinion-based classification, such as where injected. The semantic classification is invariant to process and product, but requires a mapping to process stages. This mapping is a level of indirection that ties a semantic class to a specific process stage(s). The cost of this indirection is reflected in the need to calibrate the distribution of these semantic classes for specific processes.

The opinion-based classification suffers in several ways. Firstly, as noted, the classification is error-prone. Secondly, it is very specific to a process and therefore does not map between different processes. Finally, it cannot work where the process is not well defined or the process is being changed dynamically to compensate for problems.

Clearly, semantic classification has advantages. To be able to measure the progress of a product, the mapping of semantic classes to the process should be feasible. Essentially, a set of such semantic classes should exist that maps to the process. Classification can always have some degree of subjectivity, however, orthogonality reduces the human error in classification by providing classes that are distinct and mutually exclusive.

 

Sufficient Conditions
The set of all values of defect attributes must form a spanning set over the process sub-space.

The sufficient conditions are based on the set of elements that make up an attribute, such as defect type. Based on the necessary conditions, the elements need to be orthogonal and associated to the process on which measurements are inferred. The sufficient conditions ensure that the number of classes are adequate to make the necessary inference. Ideally, the classes should span the space of all possibilities that they describe. The classes would then form a spanning set with the capability that everything in that space can be described by these classes. If they do not form a spanning set then there is some part of the space that we want to make inferences on that cannot be described with the existing data. Making sure that we have the sufficiency condition satisfied implies that we know and can accurately describe the space we want the data to project into.

Given the experimental nature of the work, it is hard to apriori guarantee that sufficiency is met with any one classification. Given that we are trying to observe the world of the development process and infer about it from the defects coming out, there are the tasks of (a) coming up with the right measurement, (b) validating the inferences from the measurements with reference to the experiences shared and (c) improving the measurement system as we learn more from the pilot experiences. However, this is the nature of the experimental method [19]. For example, in the first pilot[15], the following defect types evolved after few classification attempts, function, initialization, checking, assignment, and documentation. This set, as indicated earlier in this section, provided adequate resolution to explain why the development process had trouble and what could be done about it. However, in subsequent discussions [16] and pilots it was refined to the current eight. Given the orthogonality, inspite of these changes several classes, such as function and assignment and the dimension they spanned (associations) remained unchanged.

 

Classification for Cause-Effect

Collecting the right data that can provide a complete story to relate cause attributes with effect can provide an organization a gold mine of information to learn from. Figure 1 shows three major groups of data that are important to have. One group are the cause attributes which when orthogonally chosen provide tremendous leverage. So far, we have mentioned defect type and later in the paper we will discuss defect trigger. The second group is meant to measure effect - which could include explicit measures of effect or those computed as a function of other measures. Traditionally there have been several ways to measure effects. An explicit measure commonly used in IBM is severity; the severity of a defect is usually measured on a scale of 1-4. More recently, the impact of field problems on a customer is captured in a popular IBM classification: CUPRIMD [20], standing for Capability, Useability, Performance, Reliability, Installability, Maintainability and Documentation. Other measures of impact which are functions computed over existing data include Reliability Growth, Defect Density, etc. The third group is really meant to identify sub-populations of interest. These are typically attributes that distinguish projects, people, processes, tools etc. The list is limitless in that it could include almost any attribute which is considered meaningful to track. The availability of such sub-populations identifiers is very valuable and would provide an ideal fishing ground to study trends similar to those undertaken in market segmentation and analysis studies.

   figure78
Figure 1: ODC Data to Build Cause Effect Relationships


next up previous
Next: The Defect Type Attribute Up: Orthogonal Defect Classification Previous: Introduction

rchill
Thu Apr 1 13:33:37 EST 1999