Defect Removal Metrics Exercises

1/12/07

From the above DRE table, compute the following:

1.) Effectiveness of the code inspections. (“How good are the code inspections?”).

Number of defects found during code inspections = 61

Number of defects entering the code inspection = (total defects produced upto the code phase – defects found before code phase i.e. upto design phase) = 151-21 = 130

Therefore, code inspection found 61 out of 130 defects.

Hence DRE of code inspections = 61/130 = 46.9%

2.) Effectiveness of unit testing (“How good are the unit tests?”).

Defects found during unit tests = 32

Defects entering unit tests = 159 – 82 = 77

Effectiveness of unit tests = 32/77 = 41.56%

3.) Requirements PCE (“How well are requirements problems found and fixed during the requirements inspections?”).

(Note the wording of the question: This is about finding and fixing just requirements problems, not about all problems. Hence, we need to look at the PCE, not the DRE which looks at all kinds of problems.)

Req defects found during req phase = 5

Total req defects = 15

Req PCE = 5/15 = 33.33%

4.) Given that the codesize is 30 KLOC, compute the release defect density. (“How good is the product?”)

Total defects created upto the date of release = defects created upto ST = 165

Total defects found upto release = defects found upto ST = 148

Number of defects remaining in the product at the time of release = 165 – 148 = 17

Defect density at the time of release = 17 / 30 KLOC = 5.67 defects/KLOC.

5.) Coding defect injection rate. (“ How frequently are mistakes made during coding?”)

Number of coding defects = 98

Rate of code defect injection = 98 / 30 KLOC = 3.27 defects / KLOC

6.) The system has 6 modules, with sizes and defects found shown below. Which of the modules do you think may need redesign / redevelopment?

First, compute defect rates for each module, by dividing defects by the size of that module. Also, let us add a total defect rate column.

Mod1 / Mod2 / Mod3 / Mod4 / Mod5 / Mod6
Req defects rate / 1.0 / 0.13 / 0.38 / 5.0 / 0.33 / 1.5
Des defect rate / 6.0 / 0.53 / 0.75 / 8.0 / 2.33 / 1.5
Code defects rate / 16.0 / 1.73 / 1.50 / 19.0 / 3.67 / 7.0
Overall defects rate / 23.0 / 2.40 / 2.63 / 32.0 / 6.33 / 10.0

If we look at the overall defect rates, module 4 is clearly in the worst shape, and it also had a lot of req defects. It almost certainly could use major work.

Module 1 is in the next worst shape. The rate numbers look bad. But if we look at the absolute numbers, there was only one requirements. However, there were 6 design defects in 1KLOC, so maybe it needs work.

Module 6 is next worst, but most of the defects seem to be code defects. We should look deeper, but it may not need rework, though perhaps another code review may be useful (or if the first one was pretty deep and that’s why it uncovered so many defects, then maybe we don’t need another).

Module 5 is actually pretty average, none of the numbers look way out of line for its size. It is probably fine.

Modules 2 and 3 seem to be in good shape.

Some additional information regarding this question:

However, this is just a crude indicator. We really need to go behind the numbers, to find out whether what we suspect is true, especially because the numbers are so small – in many cases, the densities look high because the size is only 1 or 2 KLOCs. Interpreting from so few defects is a bit dangerous! The value is simply that we have a trigger for asking a question to the developers: “do you think it would be helpful to take another look at module 4 given how many problems we are finding?”, and then just trust their answer.

Notice how crude this indicator is: would our opinion of how good module 2 and 3 are change dramatically if we found that the problems found in req & design were highly significant? (e.g. “we didn’t realize that the user had a tight performance requirement on this operation. We have now added the requirement, but since we had no time to redevelop the module, we disabled a compression scheme that saved space on disk, but took a long time”. Now do we think the module needs redevelopment? That’s why the big constant emphasis on “go behind the numbers. The numbers don’t tell you everything).

Actually, looking at the numbers, I would wonder if modules 2 and 3 were tool generated. For one thing, it seems odd that a 30K application would have two huge modules and a number of tiny modules. It is also interesting how few bugs there are in these large modules. If we go behind the numbers, we might find that these were auto-generated, and that’s why the numbers look so good. If we used function points instead of lines of code, suddenly these modules might look no different from the rest.


Your organization has executed many projects over the past few years, inspects all documents and code, and fills out test and inspection reports regularly. Given this data, how would you figure out the following:

7.) Whether your requirements techniques need improvement?

Look at defect injection rate for requirements, see if that rate seems acceptable, and if possible, compare to other similar organizations. (Need to look at organizational data i.e. averages across a number of projects, not just one project. Also, need to classify into groups of similar projects, so we don’t mix apples and oranges).

It is also useful to look at what % of all defects are requirements defects. If it is 1-2%, that may indicate that you should address other areas first, but if it is 20-30%, then that may indicate that your requirements practices are weak compared to your design, coding and other practices.

It may also be preferable to look at requirements inspection effectiveness, to know if those techniques need improvement. Also, it is not clear if inspections are the only way we find “defects” during requirements, or whether we find them through prototypes, follow-up conversations with customer etc. If our data includes all that, then we get a much better picture of whether our requirements techniques are pretty good.

8.) Whether your configuration management techniques need improvement?

A traditional DRE chart does not list defects according to whether they are config mgmt defects. In fact, our organization’s inspection and test reports may or may not have “config mgmt problems” as a defect classification. If it does, we can add a CM column to the DRE chart, and compute CM effectiveness. If not, we may have to go through all the data by hand, figure out which problems are CM problems, then compute effectiveness rates, and compare with “other similar organizations” if available.

9.) Whether your design inspections are effective?

Look at design DRE numbers (average across similar projects).

10.) Whether you are doing a good job of delivering quality code?

Look at the defect density in released code (average across similar projects).

11.) Whether coding mistakes are more likely in large modules, or in complex modules (i.e. complex functionality)?

Do a scatterplot of code defect injection rate vs. size, to find likelihood of mistakes in large modules. If there is a positive correlation (i.e. as size increases, code defect injection rate increases), then coding mistakes are more likely in large modules. (The correlation value tells us how confident we are in asserting this. Correlations close to 1 indicate that this is certainly the case, correlations close to 0 says there is little or no relationship, negative correlations indicate that coding mistakes are less likely in larger modules).

A scatterplot of code defect injection rate vs. complexity would similarly tell us whether complex modules are more likely to have coding mistakes. Note that unlike size, complexity is not always a readily available number! We will have to go through some subjective process of estimating a complexity number for each module.

Alternatively, we can group modules into “low complexity”, “medium complexity” and “high complexity” based on some criteria we define, then plot a histogram of average code defect injection rates vs. complexity. If defect rate is increasing significantly as complexity increases, that tells us that coding mistakes are more likely as complexity increases.

We are using a histogram rather than a scatterplot because the “low / medium / high” scale we have defined is an ordinal scale, not a ratio scale. Actually, we can still do a scatterplot, but we cannot compute a statistical correlation coefficient, we will just have to eyeball the relationship. (“to the best of my knowledge!”)

To find out which has more impact on coding mistakes, complexity or size, we would have to fit a straight line to the data points, compute the slope of each line, and compare them. If complexity is really just an informal concept with ordinal values, then we cannot answer it mathematically/statistically, we can just show the two graphs and let viewers draw their own conclusions (basically because the questions is not sufficiently well-defined for a mathematical answer).

Yes, this is a really complex question! The purpose of it was for you to think through questions like this, which arise all the time in real life, and understand how to use quality tools to answer questions like this. You also see the interplay of metrics, quality tools, statistics, and measurement scales here. I think it is really important for you to understand this, but until you have more practice with this, I will not ask such nasty questions (on this midterm)!

12.) How effective the coding standards and checklists that you introduced two years ago have been in reducing coding mistakes?

Look at the code defect injection rates before and after to see if they impacted the number of mistakes made. A run chart is the best tool for this, since we want to see the impact over time, and it may have taken a while for the use of the new coding standards and checklists to catch on. If we see a significant and consistent downward trend over time, that tells us that they have been effective. The slope of the trend line tells us how effective.

13.) Whether teams consisting largely of highly experienced people are better at avoiding requirements and design errors that teams of less experienced people?

We use scatterplots of requirements and design error injection rates vs. average experience level of teams, and see if the correlation is positive & high, and whether the straight line fitted to the point has a downward slope.

Note that “average experience level of team” may or may not be what we want to use as the basis for differentiation. We have to define what we mean by “teams consisting largely of highly experienced people” – if that means, for example, “over 40% of the team has over 7 years experience” then we find the average defect injection rates for all projects that fit that criteria, and compare it to the average for all projects that fit our criteria for “teams of less experienced people”.