Fast Abstracts Archives . .

FastAbstracts


WHAT IS a
FastAbstract

The History

Archives of
FastAbstracts

ISSRE 2003
ISSRE 2002
ISSRE 2001
ISSRE 2000
ISSRE 1999
ISSRE 1998
FTCS 1999
FTCS 1998



 

 

 

 

Code Coverage - Is it Feasible in a Large Scale Development Project?


Tom Pavela
IBM, IMS Product Division
Santa Teresa Laboratory
E-mail - pavela@us.ibm.com

 

Introduction

The challenge of delivering quality tested products has never been greater. Finding the right tools and processes to provide a better tested product is very difficult. Investing in the wrong tools/processes can be costly and possibility fatal for the product. A code coverage system seems to be a wise investment. Studies show that with 100% code coverage at unit testing phase, one would detect 15% of the defects in the product. Another 45% of the defects could be found in functional test phase. Questions arises, Will a code coverage system really be a benefit? How does one collect code coverage data in a large scale development project when the code is constantly changing ? How does one store all of the data for a module that most of the test cases exercise?

Benefits of a Code Coverage System

With a code coverage system one can make intelligent decisions on what testing is needed. One can query the information and answer a number of questions:

  1. What code has not been tested?
  2. What is the test case overlap and which test cases can be deleted or combined?
  3. When a code change is made to a module, what test cases need to be run?
  4. When a set of test cases are run for changed code, what was the code coverage of the changed code?
  5. Are there new test cases needed for the new code?
  6. A defect was found, was the code tested? Did we have test cases that exercised the defected area? (Causal Analysis).
  7. What test cases are finding problems? Which ones are good candidates for the regression testing.

In addition in finding more defects in the product, the testing cycle time can be reduced because the right test cases are executed versus randomly selecting test cases.

Difficulty

Now that we see a code coverage system will help in delivering a better tested product, let's look at the characteristics of the large project. A large project usually has over 3 million lines of code, over 2000 modules and over 3000 test cases. The average release for the project is about one half million lines of new/changed code. It takes approximately 3 months to run all of the test cases.

During functional testing phase the new test cases are being executed to validate the product. Code defects are found and fixed during this phase. How does one gather code coverage data during this testing phase? If one takes the approach of freezing the code changes while collecting the code coverage data, they will find out that the data needs to be re-gathered with the new code changes, a never ending loop. If you wait until the functional testing phase is over you still have the problem of continuous code changes, plus you need to rerun all of the test cases which takes months to run.

Another problem that arises with a the large project is the amount of code coverage data that is generated. For each line of code, do you save data for every test case? Some code is common code, such as, initialization code which will be touched by all test cases. With 3000 test cases and 3 million lines of code, a very large table is needed to store each line of code and each test case that exercised that line of code.

Handling Code Changes

The best scenario is to collect the code coverage data each time the test cases are executed and have the capability to handle the code churn without rerunning any test case. To make this scenario work one needs to update (re-sequence) the code coverage data base with the code changes. This re-sequencing would eliminate the need to freeze your code while collecting the code coverage data. Every time a code change is incorporated into the system the re-sequencing routine would make the necessary adjustments to the data base. How are the changes handled in the data base? There are three different types of code changes: new line, changed line, deleted line. The new line(s) of code would have be annotated "Not tested", the changed line(s) of code would have be annotated "Re-test required", and deleted lines of code would not have any annotation since the code has been removed from the data base. An annotation is made at the module level that a change has been made to the module and it should be "Re-tested" Also the line before the code deletion is annotated and the line after the code deletion. In the example below the left had side of the code is before code changes and the right hand side is after code changes. Lines 6, 23-27 were added to the original code. Line 12 was changed and line 13 was deleted from the original code. The annotation mark is after the line sequence number, "*" represents that the code was tested, "n" represents the new and untested code , "r" represents changed code that should be re-tested and "d" represents code has been deleted.

00012 r odb = "$"||DB
Original Code Changed Code
00001 * HD_Unload: 00001 * HD_Unload: 
00002 * Savearg = arg 00002 * Savearg = arg
00003 * REST=UPPER(SaveArg) 00003 * REST=UPPER(SaveArg)
00004 * PARSE REST WITH 'DB=' DB . 00004 * PARSE REST WITH 'DB=' DB .
00005 * PARSE REST WITH 'DBMST=' DBMST . 00005 * REST WITH 'DBMST=' DBMST .
00006 /************************************/ 00006 n PARSE REST WITH 'PART=' PART .
00007 /* Create DB list */ 00007 /**********************************/
00008 /************************************/ 00008 /* Create DB list */
00009 * if DB ¬= "" then 00009 /**********************************/
00010 * do 00010 * if DB ¬= "" then
00011 * odb = "@"|| DB 00011 * do
00012 * Push ODB||' ON' 00012 r odb = "$"||DB
00013 * DBR_DB_List = DB 00013 d Push ODB|| '=ON'
00014 * end 00014 d end
00015 else 00015 else
00016 do 00016 do
00017 Resource_List = '' 00017 Resource_List = ''
00018 if DBMST ¬= "" then 00018 if DBMST ¬= "" then
00019 do 00019 do
00020 Call Create_Lists 'DBMST' DBMST 00020 Call Create_Lists 'DBMST' DBMST
00021 Resource_List = DBR_DB_List || Resource_list 00021 Resource_List = DBR_DB_List || Resource_list
00022 end 00022 end
00023 end 00023 n if PART ¬= "" then
00025 n Call Create_Lists 'PART' PART
00026 n Resource_List = DBR_DB_List||Resource_list
00027 n end
00028 end

The re-sequencing routine would determine the necessary changes to the code coverage data base to adjust for the code changes, ie. adds, deletes. Using the re-sequencing method the code coverage data can be tracked as the functional test case are being executed and decisions on what test cases need to be changed, combined or deleted can be made without completing the entire data collection process.

Reducing Table Size

To reduce the table size, a table with fixed number of columns can be created. Using a file pointer method, the last column would be a pointer to a flat file which would have the list of test cases that exceeded the table. For example, assume your table is 100 columns wide, which stores the test case names and the rows are the lines of code. If there are more than 100 test cases that tested a particular the line of code, the 100th column will have a file name which contains the list of the additional test cases. Using this method can reduce the size of the table that is needed to store the code coverage data and make it more management

Summary

The re-sequencing method eliminates rerunning all of the test cases when there is a code change and will keep the data base in sync with the correct code level. Using the file pointer method reduces the table sized required to hold the all the code coverage data and still provides a the tester with the all of the benefits of a code coverage system.

References

Jillian Ye, David Godwin, and Colin Mackenzie Technical Report: TR74.158 "What is Uncovered by Coverage Testing?"