Code Coverage - Is it Feasible in
a Large Scale Development Project?
Tom Pavela
- IBM, IMS Product Division
- Santa Teresa Laboratory
- E-mail - pavela@us.ibm.com
Introduction
The challenge of delivering quality tested products has never
been greater. Finding the right tools and processes to provide
a better tested product is very difficult. Investing in the wrong
tools/processes can be costly and possibility fatal for the product.
A code coverage system seems to be a wise investment. Studies
show that with 100% code coverage at unit testing phase, one
would detect 15% of the defects in the product. Another 45% of
the defects could be found in functional test phase. Questions
arises, Will a code coverage system really be a benefit? How
does one collect code coverage data in a large scale development
project when the code is constantly changing ? How does one store
all of the data for a module that most of the test cases exercise?
Benefits of a Code Coverage System
With a code coverage system one can make intelligent decisions
on what testing is needed. One can query the information and
answer a number of questions:
- What code has not been tested?
- What is the test case overlap and which test cases can be
deleted or combined?
- When a code change is made to a module, what test cases need
to be run?
- When a set of test cases are run for changed code, what was
the code coverage of the changed code?
- Are there new test cases needed for the new code?
- A defect was found, was the code tested? Did we have test
cases that exercised the defected area? (Causal Analysis).
- What test cases are finding problems? Which ones are good
candidates for the regression testing.
In addition in finding more defects in the product, the testing
cycle time can be reduced because the right test cases are executed
versus randomly selecting test cases.
Difficulty
Now that we see a code coverage system will help in delivering
a better tested product, let's look at the characteristics of
the large project. A large project usually has over 3 million
lines of code, over 2000 modules and over 3000 test cases. The
average release for the project is about one half million lines
of new/changed code. It takes approximately 3 months to run all
of the test cases.
During functional testing phase the new test cases are being
executed to validate the product. Code defects are found and
fixed during this phase. How does one gather code coverage data
during this testing phase? If one takes the approach of freezing
the code changes while collecting the code coverage data, they
will find out that the data needs to be re-gathered with the
new code changes, a never ending loop. If you wait until the
functional testing phase is over you still have the problem of
continuous code changes, plus you need to rerun all of the test
cases which takes months to run.
Another problem that arises with a the large project is the
amount of code coverage data that is generated. For each line
of code, do you save data for every test case? Some code is common
code, such as, initialization code which will be touched by all
test cases. With 3000 test cases and 3 million lines of code,
a very large table is needed to store each line of code and each
test case that exercised that line of code.
Handling Code Changes
The best scenario is to collect the code coverage data each
time the test cases are executed and have the capability to handle
the code churn without rerunning any test case. To make this
scenario work one needs to update (re-sequence) the code coverage
data base with the code changes. This re-sequencing would eliminate
the need to freeze your code while collecting the code coverage
data. Every time a code change is incorporated into the system
the re-sequencing routine would make the necessary adjustments
to the data base. How are the changes handled in the data base?
There are three different types of code changes: new line, changed
line, deleted line. The new line(s) of code would have be annotated
"Not tested", the changed line(s) of code would have
be annotated "Re-test required", and deleted lines
of code would not have any annotation since the code has been
removed from the data base. An annotation is made at the module
level that a change has been made to the module and it should
be "Re-tested" Also the line before the code deletion
is annotated and the line after the code deletion. In the example
below the left had side of the code is before code changes and
the right hand side is after code changes. Lines 6, 23-27 were
added to the original code. Line 12 was changed and line 13 was
deleted from the original code. The annotation mark is after
the line sequence number, "*" represents that the code
was tested, "n" represents the new and untested code
, "r" represents changed code that should be re-tested
and "d" represents code has been deleted.
|
Original Code |
Changed Code |
|
00001 * HD_Unload: |
00001 * HD_Unload: |
|
00002 * Savearg = arg |
00002 * Savearg = arg |
|
00003 * REST=UPPER(SaveArg) |
00003 * REST=UPPER(SaveArg) |
|
00004 * PARSE REST WITH 'DB=' DB . |
00004 * PARSE REST WITH 'DB=' DB . |
|
00005 * PARSE REST WITH 'DBMST=' DBMST . |
00005 * REST WITH 'DBMST=' DBMST . |
|
00006 /************************************/ |
00006 n PARSE REST WITH 'PART=' PART . |
|
00007 /* Create DB list */ |
00007 /**********************************/ |
|
00008 /************************************/ |
00008 /* Create DB list */ |
|
00009 * if DB ¬= "" then |
00009 /**********************************/ |
|
00010 * do |
00010 * if DB ¬= "" then |
|
00011 * odb = "@"|| DB |
00011 * do |
|
00012 * Push ODB||' ON' |
00012 r odb = "$"||DB |
|
00013 * DBR_DB_List = DB |
00013 d Push ODB|| '=ON' |
|
00014 * end |
00014 d end |
|
00015 else |
00015 else |
|
00016 do |
00016 do |
|
00017 Resource_List = '' |
00017 Resource_List = '' |
|
00018 if DBMST ¬= "" then |
00018 if DBMST ¬= "" then |
|
00019 do |
00019 do |
|
00020 Call Create_Lists 'DBMST' DBMST |
00020 Call Create_Lists 'DBMST' DBMST |
|
00021 Resource_List = DBR_DB_List || Resource_list |
00021 Resource_List = DBR_DB_List || Resource_list |
|
00022 end |
00022 end |
|
00023 end |
00023 n if PART ¬= "" then |
|
00025 n Call Create_Lists 'PART' PART |
|
00026 n Resource_List = DBR_DB_List||Resource_list |
|
00027 n end |
|
00028 end |
|
|
The re-sequencing routine would determine the necessary changes
to the code coverage data base to adjust for the code changes,
ie. adds, deletes. Using the re-sequencing method the code coverage
data can be tracked as the functional test case are being executed
and decisions on what test cases need to be changed, combined
or deleted can be made without completing the entire data collection
process.
Reducing Table Size
- To reduce the table size, a table with fixed number of columns
can be created. Using a file pointer method, the last column
would be a pointer to a flat file which would have the list of
test cases that exceeded the table. For example, assume your
table is 100 columns wide, which stores the test case names and
the rows are the lines of code. If there are more than 100 test
cases that tested a particular the line of code, the 100th column
will have a file name which contains the list of the additional
test cases. Using this method can reduce the size of the table
that is needed to store the code coverage data and make it more
management
Summary
The re-sequencing method eliminates rerunning all of the test
cases when there is a code change and will keep the data base
in sync with the correct code level. Using the file pointer method
reduces the table sized required to hold the all the code coverage
data and still provides a the tester with the all of the benefits
of a code coverage system.
References
Jillian Ye, David Godwin, and Colin Mackenzie Technical Report:
TR74.158 "What is Uncovered by Coverage Testing?" |