Improving Software Defect Tracking System to Facilitate Timely and Continuous Software

Jingyue Li, Tor Stålhane, and Reidar Conradi – NTNU, Trondheim, Norway

Enhancing Software Defect Tracking Systems to Facilitate Timely Software Quality Assessment and Improvement

For projects that rely on empirical process control and frequently deliver working versions of software, developers and project managers regularly need to examine the status of their software quality. Our studies illustrate that simple goal-oriented changes or extensions to the existing data of their respecting defect tracking systemscould provide valuable and promptinformationto improve their software quality assessment and assurance.

Keywords:

D.2.19.d Measurement applied to SQA and V&V, D.2.18.g Process implementation and change

1. Introduction

Data in Defect Tracking Systems (DTSes) of software companies are usually applied to ensurethatreported defects eventually get fixed. Such defect-related data have anobvious potential to be used to Assessing Software Quality (ASQ) and Software Process Improvement (SPI)[1].

We have examined the DTSes of nine Norwegian companies and found that most of the data entered in these systems were either never used, or were irrelevant, unreliable, and difficult to use for ASQ and SPI. In response to our findings, we improved the DTSes in two of the companies, who were willing totry the idea of enhancingDTSesto facilitatetheir ASQ and SPI. The improvementsweredefined compliant to the GQM paradigm [2]. Our main focus was either to (a) introduce new defect classification attributes into existing DTS or (b)revise the values of existing ones. The goals were to provide project managers and developers more updated, relevant, reliable,and easy-to-analyze defect data, so that they can assess their software quality and find potential SPIin a cost-effective way.

To evaluatehow well the improved DTSesdid support better ASQ and SPI, we performed the two rounds of defect analyses. In the first round, we analyzedthe newly reported defect data six months after both companyhad deployed their improved DTSes. In parallel with this defect analysis, in one of the two companies, we held a workshop to collect feedbacks on the improved DTS. In the other company, we conducted a survey by email for the same purpose. After the defect analysis, the results were reported to the companies and triggered their corresponding SPI activities. Six and 12 months respectively after the SPI had been implemented in the two companies; we performed the second round defect analysis. In this analysis, wecompared defect data reported after the SPI and the defect datacollected in the first round of defect analysis. The comparisons showed that the new SPI activities in these companies led to significantly lower defect densities and higher defect fixing efficiency.

Lessons learned from this study illustratehow to keep developers and testers motivated to enter high quality defect data into their DTSes:

The DTS improvements must be lean, understandable, and goal-oriented.
The people who fill in the data must be given quick feedbacks to show that their entered data really are being used to benefit their daily work.

The study also revealed several pitfalls that typically reduce the quality of the reported defect data:

Missing updates of the previously entered defect attribute values, when such defects are re-examined.
Usage of default values in the defect attributes.
Improper definition and updating of the defect attributes and their values.

2. Defect data in our investigated DTSes

Most our examined companies are medium or large ones. Except for one small company, all other companies recorded data with at least 10 defect attributes, such as a textual summary, a detailed description, priority, severity, and calendar dates of reporting and fixing defects. Defect attributes included in the DTSes of the examined nine companies are marked by “X” in Table 1. We found that some defect data of our examined companies were ready for ASQ or SPI, for example:

All companies recorded the date and time that a defect report was created. Six of the nine companies used a dedicated attribute to record the email address or name of the person who created the original defect report. Seven companies assigned a severity value to each defect. By combing such data, the company can quickly find out the critical quality issues. For example, the severe defects that were reported after a release by an important customer.
Seven companies recorded the name of the infected “modules”. This information can help developers to identify the most defect-prone or change-prone parts of a system. An examination of how to eliminate these “hot” parts can help companies maximize the Return On Investment (ROI).
Information in the work logs may indicate what should be improved to speed up the defect fixing. For example, developers’ complaints about a complex architecture indicate that the software design needs to be adjusted.

However,all companies only use some of the defect data for project management purposes, i.e. tracking the status of defects. None of the companies had analyzed the ASQ and SPI related data in their DTSes. The assembled information was, in some sense, an“information graveyard”.

As the DTSeswereconceived without ASQ and SPI goals in mind, no much existing DTS data wereadequate for ASQ and SPI purposes, due to:

Lack of necessary data: Most companies were satisfied that a defect was somewhat fixed, without caring howmuch effort was spent on fixing the defect and why the defect was introduced in the first place. None of the companies recorded the actual effort used to a fix defect. Thus, little information is available for measuring how cost-effective the defect fixing is and for performing “root cause analysis” to prevent further defects, especially the ones that are costly to fix.
Incomplete data: We found that more than 20% of the data inseveraldefect attributes, such as severity or location, had not been filled in.
Inconsistent data: We found that some people used the name of an embedding module or subsystem as the location of a defect, while others gave the name of a function.
Mixed data: Four companies, i.e. AN, CS, SN, and DA, did not use a separate attribute to describe how defects were discovered. The procedures that the employees used to discover a defect are mixed with other text in the short summary or detailed description of the defect. Thus, extracting testing-related informationfor ASQ or SPI purposesis difficult.

3. Two case studies of improving DTS

We helped two companies from Table 1, namely DP and PW, to improve their DTSes by following the GQM paradigm when revising and introducing a defect classification scheme [3][4].

3.1. DTS improvement in company DP

Motivation: Company DP is a software house that builds business-critical systems, primarily for the financial sector. Personnel in different departments of the company used the existing DTS in different ways. Since nobody used the system in a “useful” way, there were few incentives to improve either the system or its use. However, a gap analysis that we performed in this company showed that one of the main concerns of the testers and developers was the company’s defect reporting and prioritization process. Another main concern was to reduce the defect fixing effort.

Goal and questions:The goal of the DTS improvement was to reduce the defect density and to improve defect fixing efficiency. To achieve this goal, we wanted to improve the DTS to provide supplementary information that the Quality Assurance (QA) managers could use to answer the following questions:

What were the main types of defects?
What can the company do to prevent defects in the early stages of a project?
What are the reasons for actual defect fixing effort?

Metrics: The existing DTS of this company was not instrumented to collect data for answering our questions. We proposed to revise it, based on an analysis of existing data and the QA manager’s suggestions. To avoid abrupt changes, we did not introduce new defect attributes into the DTS, only revised the valuesof the existing ones.

Table 1. Defect attributes in the examined DTSes

Items in the DTS / Company name & number of employees
AN
320 / CO
180 / CS
92.000 / PW
500 / DP
6.000 / SN
400 / DT
9.000 / SA
30.000 / DA
10
Description / Defect report id / X / X / X / X / X / X / X
Short textual summary / X / X / X / X / X
Detailed description / X / X / X / X / X / X
High-level category* / X / X / X / X / X / X / X
Timestamp and persons involved / Createddate and time / X / X / X / X / X / X / X / X / X
Creator / X / X / X / X / X / X / X
Modified date and time / X / X / X / X / X / X / X
Modified by / X / X / X
Responsible person / X / X / X / X / X
Due time / X / X
Closed time / X / X / X
Estimatedduration to fix / X / X
Impact / Priority / X / X / X / X / X / X / X
Severity / X / X / X / X / X / X / X
Status trace / Status / X / X / X / X / X / X
Resolution / X / X
New release No. after fix / X / X
Test activity / Tester / X / X / X
Test ID / X / X / X
Test priority / X
Test description / X / X / X / X / X
Location / Release / X / X
Module / X / X / X / X / X / X / X
Version / X / X / X / X
OS& hardware / X
Supplementary info. / Comments / X / X / X / X / X
Related link / X / X / X
Work log / X / X / X

* High-level category: defect/enhancement/duplication/not-defect

Validation of the DTS improvement: We performed two rounds of validation of the improve proposal, together with the test manager, one developer and one project manager, through a detailed analysis of defects from earlier projects. Examples of improved attributes in the DTS of this company, after validations, are:

Fixing type: Introduced a new set of values to categorize the defect fixing activities of the developers.
Effort: Three qualitative values to classifydefect fixing effort as “simple”, “medium”, or “extensive”. “Simple” means that the developers spendtotally less than 20 minuteseffort to reproduce, analyze, and fix adefect. “Medium” meansthe effort isbetween 20 minutes and 4 hours, and “extensive”meansthe effortismore than 4 hours.We usedsuch a simplified Likert scale,because it is not cost-effective to ask developers to put too much effort in filling in a more precise number, which will not benefit our intended analysis.
Root cause: The values here are project entities, such as requirement, design, development, and documentation, to characterize the origins of each defect.

Follow-up: After the validation, we gave a presentation to developers, testers and project managers. We explained what the company could use the revised attributes for and what they could get back from using the improved DTS. To avoid incomplete data, the company also revised the work flow around the DTS, so that developers and testers were reminded tofill in defect data before closing a defect.

3.2. DTS improvement in company PW

Motivation: Company PW is a software product-line company, with only one product, but they deploy this product on more than 50 different operating systems and hardware platforms. The results of a similar gap analysis in this company show that their QA persons prioritized a more formal DTS. The QA managers wanted a mechanism to analyze defect information quickly, because the company receives thousands of defect reports every month, and the external release lifecycle is around three months.

Goal and questions:The goal was also to reduce defect density and to improve defect fixing efficiency. The DTS was improved to provide supplementary information that the company can use to answer following questions:

What can the company do to prevent defects in the early stages of a project, and to detect defects before the software reaches the customers?
Which testing activities discovered or reproduced most defects?
What are the reasons for the actual defect fixing?

Metrics: We added and revised defects attributes based on the IBM ODC [3], the “suspected cause” attribute of the IEEE standard [4], and suggestions from the QA managers of the company.

Validation of DTS improvement: We performed two rounds of validation, together with one company QA manager, one tester, and one developer, tried to classifydefects that were reported in previous projects. Examples of the added and revised attributes for the DTS of this company, after validation, are:

Effort: As for the company DP, but having only two values, i.e. “quick-fix” and “time-consuming”. “Time-consuming” means that more than one person-day was totally spent in reproducing, analyzing, and fixing the defect.
Fixing type: Values combined the extension of values of the IBM ODC “type” attributes [3] and the categories of typical defect fixing activities of this company.
Severity: Values are defined according to the impact of a defect on running the software.
Trigger: Values are the categories of the typical testing activities of the company.
Root cause: the same as for the company DP.

Follow-up: After validation, we gave a presentation to project managers to explain the added or revised attributes. We uploaded a revised online manual of the DTS to help people use the improved DTS. To avoid making large changes in the system, we separated the newly added attributes from the existing ones as “extra” attributes.

4. Software quality insights in the companies from improved DTS

4.1. Company DP: Defect data supplemented the results that earlier Post Mortem Analysis (PMA) yielded

In the first round defect analysis of this company, we downloaded information from 1053 defects reported during system tests in two releases of a large system. By analyzing the “root cause” and “fixing type” attributes, we found that 397 of these defects were related to development and were responsible for majority of defect fixing effort. We found that most of these 397 defects comprised wrong/missing functionality or that an incorrect/missing text message wasshown to the users of the system. When the QA manager saw the analysis results, she explained that those defects were probably caused by hiring a large number of consultants who had excellent development and coding experience, but insufficient domain knowledge of banking systems. When the developers did not possess sufficient domain knowledge of the intended application area, system quality suffered accordingly. Without such defect data and analysis, the QA manager would not have acquired this insight, especially when an early PMA showed that developers of this company were proud of their application domain knowledge and preferred high-level requirements specifications, because this allowed them to use their creativity in later design and coding. In response to the defect analysis results, the company changed itshiring strategy by putting more emphases on evaluating the domain knowledge of new staff before recruiting them. Six months later, we collected newly reported defect data of the follow-up releases of the same system. The newly collected defect data were compared with the defect datawe collected before the hiring strategy change. Results showed that percentage of effort spent on fixing missing domain knowledge related defects among the total defect fixing effort was reduced from 60% to 30%[1]. Additionally, the results show that the average effort on fixing all types of defects were reduced 25%.

4.2. Company PW: Defect data helped to support the project managers SPI decision

In the first round defect analysis of this company PW, we downloaded and analyzed 796 defects from two projects. The developershad classified 166 of these defects as “time-consuming”. Simple statistical analyses of the “fixing type” attributes of the“time-consuming” defects show that:

60% of the “time-consuming” defects were related to defects that could be easily detected by more thorough code reviews, such as wrong algorithms, or missing exception checking and handling.

One project manager of the examined projects had a feeling that his project needed more formal code and design reviews. However, the project manager had no solid data to support his feeling to make a decision, because code reviewsrequire extra effort.After seeing the defect analysis results, as a first step, this project manager required the project developers performing formal code review after each defect fixing. As in company DP, we collected the newly reported defect data in the same project 12 months after the new code reviews wereenforced, and then compared the data with the ones we previously collected.Results show that the percentage of post-release defects being attributes to defect fixing among all post-release defects was reduced from 33% to 8%. Such percentages were 20% and 10% in companiesstudied in [5].

5. Lessons learned from data collection

For data-driven SPI decisions, high quality DTS data are the prerequisite. The study reveals several issues regarding collecting high quality defect data.

5.1. The data collection should be goal-oriented and lean. It is not “nice to have”.

Before proposing an improvement to the DTS, company managers should have a clear goal of what analyses they want to perform and why they want to perform them. Following the GQM spirit of lean and relevant data, we gathered only the minimum data for the intended analysis. For example, we used only qualitativedefect fixing effort like“simple”, “medium”, and “extensive” to save developers’ effort of filling in accurate numbers, since the focus was to identify and prevent the “time-consuming” defects, not doing a full ROI analysis.