ODC - Orthogonal Defect Classification

next up previous
Next: Trigger Definitions Up: Software Triggers as Previous: Introduction

Software Triggers and ODC

Software faults are dormant by nature. This is particularly true when considering faults that cause failures once a software product is released in the field. Faults which surface as failures for the first time after a product is released often have been dormant throughout the period of development, which can range from a few months to a few years. Furthermore, these faults do not necessarily surface in the first few months of field exposure, but often remain dormant for several years.

What is it that works as a facilitator, activating dormant software faults to result in failures ? That catalyst is what we call the trigger. We are not trying to identify the specific sensitization that is necessary for each unique fault to be exercised. We are instead identifying the broad environmental conditions or activites which work as catalysts assisting faults to surface as failures. In an abstract sense, these are operators on the set of faults to map them into failures. Figure 1 illustrates different triggers that force a fault to a failure.

   figure43
Figure 1: Triggers

The concept of the trigger is quite new. To put it in perspective, let us for a moment digress and discuss the more commonly known attributes of failures. This will help us differentiate what triggers are and clarify any potential confusion. Some of the more commonly discussed attributes of failures are their failure modes and characteristics such as symptom, impact and severity. The symptom, a visible attribute, is the characteristic displayed as a result of the failure and the net affect on the customer. For instance, the symptom attributes reported in the IBM service process have a value set such as: hang, wait, loop, incorrect-output, message, and abnormal termination (abend). The fault injection experiments also use a similar attribute (often called failure or failure mode) with a value set such as: no error, checksum, program exit, timeout, crash, reboot, hang etc. [KKA95], [KIT93], [HSSS93]. The impact is an attribute that characterizes the magnitude of outage caused or severity, such as: timing, crash, omission, abort fail, lucky, and pass, [SHSS93].

At first glance, it is not uncommon to confuse the symptom with the trigger. However, they are very different and orthogonal to each other. In simple terms, the trigger is a condition that activated a fault to precipitate a reaction or series of reactions, resulting in a failure. The symptom is a sign or indication that something has occurred. In other words, the trigger refers to the environment or condition that helps force a fault to surface as a failure. A symptom describes the indicators that a failure has occurred, such as a message, an abend, a softwait etc. Thus, a single trigger could precipitate a failure with any of the above symptoms or severities, and conversely a symptom or severity could be associated with a variety of trigger mechanisms. In Figure 1 the failure is shown as a single state. However, when characterized with several failure modes or symptoms, that state could be split into several. The triggers tex2html_wrap_inline298 would each contribute to the different failure states identified. In this paper, we have focused on triggers and do not embark on the mappings between different triggers and the symptoms of different failures. However, these mappings have been studied and will be discussed in a separate article.

The concept of the software trigger was introduced in [SC91] where it was applied to failure analysis from defects in the MVS operating system, with the intention of guiding fault-injection. Since then, several advancements were made to the notion of triggers, most notably when orthogonal defect classification (ODC) was developed as a measurement technology [CBC tex2html_wrap_inline296 92]. Later the concept of the trigger was extended to the design phase in the software development paradigm [CHBC93].

There are specific requirements for a set of triggers to be considered part of orthogonal defect classification, and a process to establish them. We do not attempt to completely explain the concepts here and the details of the necessary and sufficient conditions are best found in [CBC tex2html_wrap_inline296 92]. However, to briefly summarize the ideas, it requires that the distribution of an attribute (such as trigger) changes as a function of the activity (process phase or time), to characterize the process. In addition, the set of triggers should form a spanning set over the process space for completeness. Changes in the distribution as a function of activity then become the instrument, yielding signatures which characterizes the product through the process. This is when the trigger value set is elevated from a mere classification system into a measurement on the process and qualifies to be called ODC. In this case, the triggers become a measurement on the verification process. The value set has to be experimentally verified to satisfy the stated N+S conditions. Unfortunately, there is no short cut to figure out the right value set. It takes several years of systematic data collection, experimentation and experience with test pilots to establish them. However, once established and calibrated, they are easy to roll out and productionize. We have the benefit of having executed ODC in around 50 projects across IBM providing the base to understand and establish these patterns.

Triggers form a vital measurement and have several uses. Since it conforms to ODC, it can be used in conjunction with other ODC attributes such as defect-type, which measures the progress of a product along the development process axis to yield effectiveness measures. This cross-product of defect-type and trigger can provide very fast feedback within one stage data [CBC tex2html_wrap_inline296 92].

So far, only preliminary aggregate trigger distributions have been published in some of our earlier papers. However, an in-depth analysis of triggers and how the distributions emerge as a function of time after product release have not been illustrated. This paper focuses on just one specific aspect of ODC, namely triggers. Furthermore, the focus is directed towards faults that manifest in failures during the operating life of the product. This is chosen primarily to pay attention to the needs of the dependability community. Since software became a dominant cause of system outage [Gra90] a clearer understanding of failure mechanisms was critically needed. An analysis of triggers provides a significant part of the needed insight.


next up previous
Next: Trigger Definitions Up: Software Triggers as Previous: Introduction

rchill
Mon Mar 29 18:54:02 EST 1999