Chapter 2
Relevant Prior Research
2.1 Basic Object-Oriented Concepts
To provide context for the literature described in this dissertation, this section outlines some basic concepts and terminology of object-oriented technology, and a brief overview of the object-oriented software development life cycle and how it differs from the traditional life cycles.
The difference between object-oriented and traditional, functionally oriented approaches to software development is predominantly one of mind-set, which subsequently influences the software's organization and the development process [Andersen and Sheffler, 1992]. Functionally oriented software is organized into functions that are applied to data, and is considered too expensive to maintain. Object-oriented approaches, which organize software into interacting objects, were established to address the difficulties of software development and software maintenance. Object-oriented concepts evolved from academic research and analysis of basic characteristics of good software practice.
Cox defines the object-oriented paradigm as “a system building tool which puts the reusability of software at the center of the software development process” [Cox, 1986]. Wegner defines object-orientation in terms of objects, classes, and inheritance [Wegner, 1987]. He views an object-oriented system as being composed of a set of objects which are grouped into classes. Classes are defined by the inheritance hierarchy. Korson and McGregor define five basic concepts in the object-oriented paradigm [Korson and McGregor, 1990]. The five concepts are objects, classes, inheritance, polymorphism, and dynamic binding.
A class defines the template for objects, which are the basic system components in the object-oriented paradigm. Objects sharing the same definitions are grouped into classes. An object is an instance of a class. An object is anything that can be perceived and is relatively stable in form; it must be denotable by name, have a well-defined context and interface, and exhibit a persistent internal state; finally, an object and its operations cannot be considered independent of one another. It is this intimate association of an object with its operations that is the hallmark of object-oriented software. Polymorphism and inheritance are two aspects unique to OO systems.
Polymorphism means having the ability to take several forms. Booch defines Polymorphism as a concept, according to which a name (such as a variable declaration) may denote objects of many different classes that are related by some common super class; thus, any object denoted by this name is able to respond to some common set of operations in different ways [Booch, 1986, 1994, 1996]. Jacobson et al., define Polymorphism as a concept which means that the sender of a stimulus (message) does not need to know the receiving object’s class [Jacobson et al., 1993]. The receiving object can belong to an arbitrary class. The stimulus (message) can be interpreted in different ways, depending on the receiver’s class. It is the receiver of a stimulus that determines how a stimulus will be interpreted, not the transmitter. One of the major benefits of polymorphism is that the programmer does not have to comprehend, or even be aware of, existing operations to add a new operation to a class. Without polymorphism, the developer writes code consisting of large case or switch statements, which implies that the developer should be aware of existing operations in order to add a new operation. Complexity of OO software gets reduced through the use of polymorphism.
Inheritance is a reuse mechanism that allows programmers to define classes incrementally by reusing previously defined classes as the basis for new classes. Inheritance decreases the software complexity of OO systems. This is primarily due to inheritance enabling the programmer to reuse previously defined classes, including their variables and operations, as part of the definition of a new class. This reduces the number of operations and operands required. Structured systems do not have an inheritance mechanism as part of their formal specification.
2.1.1 Object-Oriented Development Life Cycle
Object-orientation has had a profound effect on the software development life cycle, largely through successful exploration of the capabilities inherent in the paradigm. Two of the principal effects are:
· The concept of an object forces a different conceptualization of the problem and solution domains, and requires different analysis techniques.
· The property of encapsulation allows projects to proceed in an incremental fashion, where increments are far more insulated from each other than was previously possible.
It is beyond the scope of this dissertation to detail the variety of object-oriented development methodologies that have been proposed in the recent years. Fortunately the state of the practice has evolved to the point where a high degree of commonality has emerged between techniques, making it possible to present the salient points of most in an overview fashion. The remainder of this section is devoted to this overview since it provides an understanding of how this life cycle differs from the traditional life cycle. The following text and diagram (Figure 2.1) are adapted from the Software Architects’ Synthesis (SASY) (c) development life cycle description [Korson and McGregor].
There are five main phases in the application development process:
· Domain Analysis: the process of coming to an understanding of the domain of the application. The domain is modeled, clarified, documented, and fundamental classes and their interrelationships are sorted out.
· Application Analysis: the process of determining exactly what will be built for the requested application. The requirements specifications are clearly spelled out in this phase.
· Application Design: the process where the mechanisms for implementing the actual application are determined. System architecture, data structures, efficiency considerations, etc., are all dealt with here.
· Class Development: classes are developed and tested in the language of implementation.
· Incremental Integration and System Testing: classes are integrated into cluster and subsystem modules and finally into a complete application. Integration testing verifies that these modules work correctly, effectively, and meet requirements.
Many of these phases or steps are familiar from a traditional structured design process. The major difference is that all steps are performed repeatedly, as necessary, in multiple iterations. A number of iterations result in a complete, testable module or sub-module termed an increment. This iterative process of developing system increments continues until the system is complete. As later increments are implemented, it is inevitable that misconceptions and outright errors will be detected in increments developed earlier. Due to the insulating effect of encapsulation, all types of incorrect system behavior are corrected, when discovered, with much less violence to the overall design than would normally occur in a traditional development environment.
One complete increment of several iterations is diagrammed below.
· An increment is a defined subset of the total system functionality.
· An iteration is one pass over some set of activities.
The complete system is assembled from multiple increments.
Application Design takes a very prominent role over coding in the OO life cycle, so much so that some of the OO methodologies do not even mention coding as a separate phase. One can, therefore, expect to find a similar bias in OO metrics, where design metrics become more prominent than the code metrics. This is very different from metrics for traditional systems where most of the metrics are code-based metrics. The increased prominence of the design task also means that the level of detail available at the design level in an OO life cycle is much greater than what is available in a traditional system design. The iterative nature of the object-oriented product life cycle also presents metric opportunities and techniques not available with traditional production methods. Since analysis and design are tasks in the life cycle, that are repeated throughout the life of the product, the opportunity exists for a class of early metrics. Object-oriented designs have been determined to be conceptually more stable than other techniques because the major system concepts usually do not change from early stages of the design. Thus estimates made at an early point in development can be refined rather than discarded.
For more discussion on object-oriented software development, please refer to the books by Booch, Coad and Yourdon, or Jacobson et al [Booch, 1994, 1996; Coad and Yourdon, 1991a, 1991b ; Coad 1997; Jacobson et al., 1993]. All of these books provide an excellent discussion on object-oriented software development, and describe specific methodologies.
2.2 Software Complexity Metrics
A software metric defines a standard way of measuring some attribute of the software development process. For example, size, cost, defects, communications, difficulty, and environment are all attributes. Examples of attributes in the physical world are mass, length, time and the like. "Kilo-lines of executable source" illustrates one metric for the size attribute. Metrics can be primitive (directly measurable or countable, such as counting lines of code) or computed (such as non-comment source statements/ engineer/ month) [Grady and Caswell, 1987].
To many project managers, software metrics are a means for more accurate estimations of project milestones, as well as a useful mechanism for monitoring progress. As DeMarco says, "You cannot control what you cannot measure" [DeMarco, 1982].
There are four major reasons why one should be interested in metrics in general:
· One of the major reasons to measure the software development process is that it introduces more rigorous principles into the development process itself and results in a better understanding of the development process.
· A second reason to start a metrics program is that the measurement activities and tools together lead to a greater level of sophistication in software engineering techniques.
· Third, the use of common terminology leads to more consistent use of the most effective development environments and tools.
· Finally, we have a way to determine our progress. As we implement change, we expect to see measurable results.
Several complexity metrics exist for the traditional systems, and several more are being proposed for the object-oriented systems. Complexity metrics are designed to measure, in relative terms, the difficulty humans have in working with programs. The theoretical bases for these metrics include graph theory, information theory, human information processing theory, and general systems complexity theory. The following two subsections discuss the complexity metrics in the procedural and object-oriented paradigms respectively.
2.2.1 Complexity Metrics in the Procedural Paradigm
Several complexity metrics have been proposed to measure the complexity of a procedure, a function, or a program in the procedural paradigm. These metrics range from simple size metrics such as Lines of Code (LOC) to very complicated program structure metrics such as Robillard and Boloix’s statement inter-connectivity metrics [Robillard and Boloix, 1989]. This section presents some sample complexity metrics in the procedural paradigm.
Some of these complexity metrics are lexical measures. The lexical metrics count certain lexical tokens in a program. These metrics include Halstead’s software science complexity metrics and Bail’s size metric. Halstead defines software science metrics based on the lexical token counts in a program [Halstead, 1977]. Software science metrics are a set of metrics based on four basic counts of lexical tokens in a program. The four basic counts are number of unique operators, number of unique operands, total occurrence of operators, and total occurrence of operands. Bail defines the program complexity on the size of a program needed to describe an algorithm [Bail, 1988]. The size of the program is defined by the number of bits needed to describe the algorithm.
Other measures are based on the analysis of patterns in a graph when the graph is constructed from the control flows of a program. McCabe defines the cyclomatic complexity measure based on the control flows in a procedure/function [McCabe, 1976]. A directed graph is derived based on the control flows of a procedure/function. The cyclomatic complexity is based on the complexity of the directed graph. McCabe and Butler later extend the cyclomatic complexity to measure structure chart design [McCabe and Butler, 1989].
Another group of metrics measures the inter-connection of system components. The inter-connection may be based on the statements or the components of a program such as procedures or functions. Some examples of such metrics are McClure’s invocation complexity, Henry-Kafura’s information flow complexity, Woodfield’s review complexity, and Robillard and Boloix’s statement inter-connection complexity.
McClure defines the invocation complexity based on the possible execution paths in a program and the difficulty of determining the path for an arbitrary set of input data [McClure, 1978]. Henry and Kafura define the information flow complexity based on the interconnectivity among procedures/functions [Henry and Kafura, 1981]. The inter-connectivity between the two procedures/functions may be caused by a data structure or a procedure call. Woodfield defines review complexity, on the basis of the difficulty of understanding a system. The more complex a system is, the more time it takes to understand it. “To understand the system, the software must be reviewed. The more times a system is reviewed, the less additional time it will take to understand the system” [Woodfield, 1980]. Robillard defines the complexity of a program in terms of the inter-connections among statements. The complexity is based on the information-theory concepts of entropy and excess entropy [Robillard and Boloix, 1989].
Cant, Jeffery and Henderson Sellers present their approach to complexity metrics which is based on an understanding of the cognitive processes of the analyst or programmer as they undertake the challenges of program development or maintenance [Cant, Jeffery, and Henderson-Sellers, 1995]. In their complexity model they attempt to quantify a number of cognitive process involving the software developer or maintainer. The authors claim that initial studies show that this cognitive complexity model is applicable to object-orient systems too [Cant, Henderson-Sellers, and Jeffery, 1994].
Almost all of the above work has been done in the context of the procedural paradigm. Since the object-oriented paradigm exhibits different characteristics from the procedural paradigm, software metrics in the object-oriented paradigm need to be studied. This is because, object-oriented concepts like inheritance, classes, and message passing cannot be characterized by any of the metrics mentioned above.
2.2.2 Complexity Metrics in the Object-Oriented Paradigm
The current state of software metrics for object technology is best described by Taylor as "In Search of Metrics" [Taylor, 1993]. We have very few metrics available that are validated for use in OO systems. Capers Jones in his book Programming Productivity, [Jones, 1986] describes the state of software metrics as "The search for a Science of Measurement." He summarizes the software measurement trends as ¾