Assessing and Estimating Corrective, Enhancive, and Reductive Maintenance Tasks: a Controlled

Assessing and Estimating Corrective, Enhancive, and Reductive Maintenance Tasks: A Controlled Experiment

Vu Nguyen[1],a, Barry Boehma, Phongphan Danphitsanuphanb

aComputer Science Department, University of Southern California, Los Angeles, USA

bComputer Science Department, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand

Abstract

This paper describes a controlled experiment of student programmers performing maintenance tasks on a C++ program. The goal of the study is to assess the maintenance size, effort, and effort distributions of three different maintenance types and to describe estimation models to predict the programmer’s effort on maintenance tasks. Twenty three graduate students and a senior majoring in computer science participated in the experiment in a software engineering lab. Each student was asked to perform maintenance tasks required for one of the three task groups. The impact of different LOC metrics on maintenance effort was also evaluated by fitting the data collected into various estimation models. The results of our study suggest that corrective maintenance is much less productive than enhancive and reductive maintenance. Our resultsgenerally confirm the previous results concluding that program comprehension activities requireas much as 50% of total effort in corrective maintenance. Moreover, the best softwareeffort model can estimate the time of 79% of the programmers with the error of 30% or less. The model achieved this performance level by using LOC added, modified, and deleted metrics as independent size variables.

Keywords: software maintenance; software estimation; maintenance experiment; COCOMO; maintenance size

1.Introduction

Software maintenance is crucial to ensuring useful lifetime of software systems. According to previous studies [1][4][29], the majority of software related work in organizations is devoted to maintaining the existing software systems rather than building new ones. Despite advances in programming languages and software tools that have changed the nature of software maintenance, programmers still spend a significant amount of effort to work with source code directly and manually. Thus, it is still an important challenge in software engineering community to assess maintenance cost factors and develop techniques that allow programmers to accurately estimate their maintenance work.

A typical approach to building estimation models is to determine what factors and how much they affect the effort at different levels and then use these factors as the input parameters in the models. For software maintenance, the modeling process is even more challenging. The maintenance effort is affected by a large number of factors such as size and types of maintenance work, personnel capabilities, the level of programmer’s familiarity with the system being maintained, processes and standards in use, complexity, technologies, the quality of existing source code and its supporting documentation [5][18].

There has been tremendous effort in software engineering community to study cost-driven factors and the amount of impact they have on maintenance effort [6][20]. A number of models have been proposed and applied in practice such as [2] [5][12]. Although maintenance size measured in source lines of code (LOC) is the most widely used factor in these models, there is a lack of agreement on what to include in the LOC metric. While some models determine the metric by summing the number of LOC added, modified, and deleted [2][21], others such as [5] use only LOC that is added and modified. Obviously, the latter assumes that the deleted LOC is not significantly correlated with the effort.This inconsistency in using the size measure results in discrepancies in strategies proposed to improve software productivity and problems in comparing and converting estimates among estimation models.

In this paper, we describe a controlled experiment of student programmers performing maintenance tasks on a small C++ program. The purpose of the study was to assess the size and effort implications and labor distribution of three different maintenance types and to describe estimation models to predict the programmer’s effort on maintenance tasks. We focus the study on enhancive, corrective, and reductive maintenance types according to a maintenance topology proposed by Chapin et al. [9]. We chose to study these types because they are the ones that change the business rules of the system by adding, modifying, and deleting the source code. They are typically the most common activities of software maintenance. The results of our study suggest that the corrective maintenance is less productive than enhancive and reductive maintenance. These resultsarelargelyconsistent with the conclusion from previous studies [2][17]. The results further provide evidence about the effort distribution of maintenance tasks in which program comprehension requires as much as 50% of maintainer’s total effort. In addition, out results on effort estimation models show using three LOC added, modified, and deleted metrics as independent variables in the model will likely result in higher estimation accuracies.

The rest of the paper is organized as follows. Section 2 gives a discussion on the related work. Section 3 provides a method for calculating the equivalent LOC in maintenance programs. The experiment design and results are discussed in Sections4 and 5. Section 6 describes models to estimate programmers’ effort on maintenance tasks. Section 7 gives some discussions on the results. Section 8 discusses various threats to the validity of the research results;and the conclusions are given in Section 8.

2.Related Work

Many studies have been published to address different size and effort related issues of software maintenance and propose approaches to estimating software maintenance cost. To help better understand and access software maintenance work, Swanson [31] proposes a topology that classifies software maintenance into adaptive, corrective, and perfective maintenancetypes. This topology has become popular among researchers, and the IEEE has adapted these types in its Standard forSoftware Maintenance [19] along with an additional preventive maintenance type. In their proposed ontology of software maintenance, Kitchenham et al. [22] define two maintenance activity types, corrections and enhancements. The latter can be generally equated to adaptive, perfective, and preventive maintenance types that are defined in Swanson’s and IEEE’s definitions. Chapin et al.[9]proposed a fine-grained twelve types of software maintenance and evolution. These types areclassified into four clusters support interface, documentation, software properties, and business rules, respectively listed in the order of their impact on the software. The last cluster, which consists of reductive, corrective, and enhancive types, includes all activities that alter the business rules of the software. Chapin et al.’s classification does not have clear correspondence with the types defined by Swanson. As an exhaustive topology, however, it includes not only Swanson’s and IEEE’s maintenance types but also other maintenance-related activities such as training and consulting.

Empirical evidence on the distribution of effort among maintenance activities helps estimate maintenance effort more accurately through the use of appropriate parameters for each type of maintenance activity and helps better allocate maintenance resources. It is also useful to estimate provide effort estimates for maintenance activities that are performed by different maintenance providers. Basili et al.[2] report an empirical study to characterize the effort distribution among maintenance activities and provide a model to estimate the effort of software releases. Among the findings reported, isolation activities were found to consume a higher percentage of effort in error correction than in enhancement changes, but a much smaller proportion of effort was spent on inspection, certification, and consulting in error correction. The other activities, analysis, design, and code/unit test, were found to take virtually the same proportions of effort in comparison between these two types of maintenance.Mattsson [25] describes a study on data collected from four consecutive versions of a six-year object-oriented application framework project. The study provides evolutional trends on the relative effort distribution of four technical phases (analysis, design, implementation, and test) across four versions of the project, showing that the proportion of implementation effort tends to decrease from the first version to the forth, while the proportion of analysis effort follows a reversed trend. Similarly, Yang et al. [32] present results from an empirical study on the effort distribution of a series of 9 projects delivering respective 9 versions a software product. All projects are maintenance type except the first project which delivers the first version of the series. The coding activity was found to account for the largest proportion of effort (42.8%) while the requirements and design activities consume only 10.2% and 14.5%, respectively. In addition to analyzing the correlation between maintenance size and productivity metrics and derivingeffort estimation models for maintenance projects, De Lucia et al. [13] describe an analysis on the effort distribution among five phases, namely inventory, analysis, design, implementation, and testing. The analyses were based on data obtained from a large Y2K project following the maintenance processes at a software organization. Their results show that the design phase is the most expensive, consuming about 38% of total effort, while the analysis and implementation phases account for small proportions, about 11% each. These results are somewhat contrary the results reported in [32]. A more recent study reported by the same authors presents estimation models and the distribution of effort from a different project in the same organization [14].

A number of studies have been reported to address the issues related to characterizing size metrics and building cost estimation models for software maintenance. In his COCOMO model for software cost estimation, Boehm presents an approachto estimating the annual effort required to maintain a software product. It uses a factor named Annual Change Traffic (ACT) to adjust the maintenance effort based on the effort estimatedor actually spent for developing the software [7]. ACT specifies the estimated fraction of LOC which undergo change during a typical year. It includes source addition and modification, but excludes deletion. If information is sufficient, the annual maintenance effort can be further adjusted by a maintenance effort adjustment factor computed as the product of predetermined effort multipliers. In a major extension, COCOMO II, the model introduces new formulas and additional parameters to compute the size of maintenance work and the size of reused and adapted modules [5]. The additional parameters take into account facts such as the complexity of the legacy code and the familiarity of programmers with the system. In a more recent model extension to estimating maintenance cost, Nguyen proposes a set of formulas that unifies two approaches to computing the size of software reuse and maintenance given in COCOMO II [28]. The extension also takes into account the size of source code deletions and calibratesnew rating values of the cost drivers specific to software maintenance.

Basili et al.[2], together with characterizing the effort distribution of maintenance releases, describe a simple regression model to estimate the effort needed to maintain and delivera release. The model uses a single variable, LOC, which was measured as sum of added, modified and deleted LOC including comments and blanks. The prediction accuracy was not reported although the coefficient of determination was relative high (R2 = 0.75), indicating that LOC is an important predictor of the maintenance effort.Jorgensen evaluated eleven different models to estimate the effort of individual maintenance tasks using regression, neural networks, and pattern recognition approaches [21]. The models use the size of maintenance tasks, which is measured as sum of added, updated, and deleted LOC, as the main size input. The best model could generate effort estimates within 25 percent of the actuals 26 percent of the time, and the mean of relative error (MMRE) is 100 percent.

Several previous studies have proposed and evaluated models exclusively for estimating the effort required to implement corrective maintenance tasks. Lucia et al. used the multiple linear regression to build effort estimation models for corrective maintenance projects [12]. Three models were built using coarse-grained metrics, namely the number of tasks requiring source code modification (NA), the number of tasks requiring fixing of data misalignment (NB), the number of other tasks (NC), the total number of tasks, and LOC of the system to be maintained. They evaluated the models on 144 observations, each corresponding to one-month period, collected from five corrective maintenance projects in the same software services company. The best model, which includes all metrics, achieved effort estimates within 25 percent of the actuals 49.31 percent of the time and MMRE of32.25 percent. When comparing with a non-linear model previously used by the same company, they showed that a linear model with the same variables produces higher estimation accuracies. They also showed that taking into account the difference in types of corrective maintenance tasks can improve the performance of the estimation model.

3.Calculating Equivalent LOC

In software maintenance, the programmer works on the source code of the existing system. The delivered maintained software includes source lines of code reused, modified, added, and deleted from the existing system. Moreover, the maintenance work is constrained by the existing architecture, design, implementation, and technologies used. These activitiesrequire maintainers extra time to comprehend, test, and integrate the maintained pieces of code. Thus, an acceptable estimation model should take into account these characteristics of software maintenance through its estimation of either size or effort.

In this experiment, we adapt the COCOMO II reuse model to determine the equivalent LOC of the maintenance tasks. The model involves determining the amount of software to be adapted, the percentage of design modified (DM), the percentage of code modified (CM), the percentage of integration and testing (IM), the degree of Assessment and Assimilation (AA), understandability of the existing software (SU), and the programmer’s unfamiliarity with the software (UNFM). The last two parameters directly account for the programmer’s effort to comprehend the existing system.

The equivalent LOC formula is defined as

EquivalentLOC = TRCF x AAM(1)

Where,

TRCF = the total LOC of task-relevant code fragments, i.e., the portion of the program that the maintainers have to understand to perform their maintenance tasks.
S = the size in LOC.
SU = the software understandability. SU is measured in percentage ranging from 10% to 50%.
UNFM = the level of programmer unfamiliarity with the program. The UNFM rating scale ranges from 0.0 to 1.0 or from “Completely familiar” to “Completely unfamiliar”. Numeric values of SU and UNFM are given in Table 2in Appendix A.

LOC is the measure of logical source statements (i.e., logical LOC) according to COCOMO II’s LOC definition checklist given in [5] and further detailed in [26]. LOC does not include comments and blanks, and more importantly it counts the number of source statements regardless of how many lines a statement can span.

TRCF is not a size measure of the whole program to be maintained. Instead, it only reflects portions of the program’s source code that are touched by the programmer. Ko et al. studied maintenance activities performed by students, finding that the programmers collected working sets of task-relevant code fragments, navigated dependencies, and editing the code within these fragments to complete the required tasks [23]. This as-needed strategy[24] does not require the maintainer to understand code segments that are not relevant to the task. The equation (1) reflects this strategy by including only task-relevant code fragments other than the whole adapted program. The task-relevant code fragments are functions and blocks of code that are affected by the changes.

4.Description of the Experiment

4.1.Hypotheses

According to Boehm [7], programmer’s maintenance activities consist of understanding maintenance task requirements, code comprehension, code modification, and unit testing. Although the last two activities deal with source code directly, empirical studies have shown high correlations between the overall maintenance effort and total LOC added, modified, and deleted (e.g., [2][21]). We hypothesize that these activities have comparable distributions of programmer’s effort regardless of what types of changes are made. Indeed, with the same cost factors [5] such as program complexity, project and personnel attributes, the productivity of enhancive tasks is expected to have no difference with that of corrective and reductive maintenance. Thus, we have the following hypotheses:

Hypothesis 1: There is no difference in the productivity among enhancive, corrective, and reductive maintenance.

Hypothesis 2: There is no difference in the division of effort across maintenance activities.

4.2.The Participants and Groups

We recruited 1 senior and 23 graduate computer-science students who were participating in our directed research projects. The participation in the experiment was voluntary although we gave participants a small incentive by exempting participants from the final assignment. By the time the experiment was carried, all participantshad been asked to compile and test the program as a part of their directed research work. However, according to our pre-experiment survey, their level of unfamiliarity with the program code (UNFM) varies from “Completely unfamiliar” to “Completely familiar”.We rated UNFM as “Completely unfamiliar” if the participant had not read the code and as “Completely familiar” if the participant had read and understood source code, and modified some parts the program prior to the experiment.

The performance of participants is affected by many factors such as programming skills, programming experience, and application knowledge [5][8]. We assessed the expected performance of participants through pre-experiment survey and review of participants’ resumes. All participants claimed to have programming experience in either C/C++ or Java or both, and 22 participantsalready had working experience in the software industry. On average, participantsclaimed to have 3.7 (±2) years of programming experience and 1.9 (±1.7) years of experience in the software industry.