Software Reliability

Aaron Hoff

Software Engineering

UW-Platteville

Abstract

This paper will briefly discuss what reliability means and why it is so important for hardware/software systems. Hardware and software reliability will then be discussed as a whole and minor differences of the two will be pointed out. Then, the paper will discuss in greater detail how software reliability can be measured using different metrics which can be used in reliability models. There are many different software reliability models that have been created to model reliability over the last couple of decades. These models can be useful tools to determine the reliability of software. Two of these models shall be discussed in detail: Mill’s Error Seeding Model, and the Jelinski-Moranda Model.

Introduction

Reliability is necessary for quality hardware/software systems. We, as a society, have become largely dependent on hardware/software systems, which have entered deeply into almost every aspect of modern life. This dependency may be for small tasks that have no immediate harmful effect on human life or economic processes, but there are many systems in the world where there can be harmful effects if the system fails. These harmful effects due to system failure increases the need for reliability of the system.

First off, reliability must be defined as well as a comparison of software/hardware reliability. The comparison will help show what kinds of models may be needed for software reliability in relation to hardware reliability.

Reliability comparison between hardware and software

Webster’s Ninth New Collegiate Dictionary defines reliability as:

1)  The quality or state of being suitable or fit to be relied on; dependable (n);

2)  The extent to which an experiment, test, or measuring procedure yields the same result on repeated trials (n).

Hardware and software systems both have ways to be tested for reliability and ways to help ensure reliability. These ways of testing cannot be the same, though. There must be different ways to test each type of system’s reliability on account of the following differences between hardware and software. One difference between the two is that hardware is physical, whereas software is not. Likewise, hardware faults are mostly physical, where software faults are mostly design faults (much harder to visualize and detect). Another difference is that software does not fail, break, wear out over time, or fall out of tolerance like hardware does. Unlike hardware, the cause of a software failure is always systematic, not random; either an error of omission, an error of commission, or an operational error was committed. [10]

Figure 1: Bathtub curve for hardware reliability [5]

Figure 1 and Figure 2 are graphical representations of how hardware and software differ over time. The hardware life expectancy depends on all of the components it is made up of. At first hardware is very faulty and all components must be tested thoroughly which eventually cuts down on the failure rate. This piece of hardware then enters its useful life which can last for a certain period of time before its components start wearing down. [5] Once these components wear down, the system becomes very faulty and can no longer be considered a reliable system unless the parts are replaced with new and reliable components.

Figure 2: Revised bathtub graph for software reliability [5]

The software life cycle usually starts out much the same as the hardware life cycle, full of bugs and faults. These bugs and faults can be detected as the system is tested and can be corrected over time. Once software enters its useful life, upgrades are usually made to the software which always creates new possibilities for error. Typically, this spikes the failure rate of the system up. More testing after these upgrades can then bring the failure rate back down. Due to the upgrades during the useful life stage of software reliability there is an increase in failures. This shows the need for constant testing and bug fixing to cut back on the failure rate, whereas the hardware middle useful life cycle stays pretty constant throughout time until the hardware wears down. Eventually the software mellows out to a consistent failure rate which can be monitored, whereas hardware components eventually fail completely which cannot be helped. [5]

Figure 1 and Figure 2 show differences between the two lifecycles showing failure rate vs. time. In order to determine and measure the reliability of these two systems we must determine the different metrics or measurements to use to properly determine the reliability. Hardware metrics are easy to determine because everything on the hardware is physical. Physical objects have precise measurements and can easily be measured. Software on the other hand does not contain physical objects that can be measured. More detail on software reliability must be discussed before models that help determine software reliability can be fully understood.

Software Reliability

What is software reliability?

Software reliability, according to the NASA Software Assurance Standard, is defined as a discipline of software assurance that:

1. Defines the requirements for software controlled system fault/failure detection,

isolation, and recovery.

2. Reviews the software development processes and products for software error

prevention and/or reduced functionality states; and,

3. Defines the process for measuring and analyzing defects and defines/derives

the reliability and maintainability factors. [2]

Pham defines software reliability as the probability that software will not fail for a specified period of time under specified conditions. [8]

Why do we need software reliability?

Computers and computer systems have become an integral part of today’s society. Many people make use and rely on computers or software controlled systems to accomplish day-to-day activities. These activities can range from games for enjoyment to applications that complete a specific task. Whatever the task or activity may be, users of any software or software controlled system heavily rely on these systems to perform exactly how the user expects. This brings on an extra duty to whoever creates a piece of software. The software must not only work as the user of the software/system needs it to work, but the software/system must also be reliable.

Examples of failed software are everywhere. Developers are not perfect and are prone to writing faulty code. One such example which cost NASA money, time, and pride is the loss of the Mars Lander in 1999. NASA’s testing of their software for the Mars Lander was not thorough enough. There was a lack of communication between the developers which resulted in wrong calculations for distances. Some wrote code using metric units and others wrote code using English units. Proper conversions were not made between the two, causing the Mars Lander to fail and crash.[4] Failed software like this demonstrates the need for reliable software engineering and ways to test how reliable a system can be over time even more.

Software Reliability Models

There have been many different models that people have come up with for software reliability. A variety of models have been invalidated due to incorrect assumptions, inaccuracy, impracticality, or costliness. [1] Software reliability models can be grouped into ten different groups: Error seeding, failure rate, curve fitting, reliability growth, program structure, input domain, execution path, nonhomogenous Poisson process, Bayesian and unified, and Markov. Two of these groups will be discussed in more detail: the error seeding model, and failure rate model. [8]

It will also be useful to understand what is meant by the maximum likelihood estimation, or MLE. This estimation method is used in determining many unknown variables in reliability models. The MLE is a popular statistical method used for fitting a statistical model to data, and providing estimates for the model’s parameters. [6] Probability can be helpful in determining or estimating statistics, but it is impossible to use probability to estimate parameters for a model. You must know the parameters’ values for probability to predict an outcome, but parameter values are not always known. So, instead of using probability, an idea of likelihood was established where one can observe data, and then estimate a model’s parameters. A large amount of mathematics goes into how each model’s MLE is determined; therefore the next section will give a brief discussion of the mathematics and what all goes into determining data values or parameters using MLE.

Maximum Likelihood Estimation

The principle of MLE is to estimate parameter values of a model which will make observed data most likely to occur. An example of how MLE works is to use the tossing of a coin. Someone flips a coin 100 times, and heads was flipped 56 out of the 100 times. This is observable data. The probability of flipping heads before the flipping even began was 0.5. After obtaining both probability and the observable data of flipping heads, MLE can now determine the likelihood of this probability, using the probability model found below. The likelihood of obtaining 0.5 probability for flipping heads using the observed data is 0.0389. [6]

Lp=0.5data= 10056!44!0.5560.544=0.0389

Figure 3: Using the probability model to determine likelihood of probability at 0.5 [6]

If different values are used for the probability that heads will be tossed, such as 0.52, a larger likelihood value is obtained. See Figure 4.

Lp=0.52data= 10056!44!0.52560.4844=0.0581

Figure 4: Using the probability model to determine likelihood of probability at 0.52 [6]

A variety of probability values should then be plugged into the probability model which allows for a graph to be made. The graph of this coin example should look something like Figure 5.

Figure 5: Graph of Likelihood vs. probability in coin flip (MLE) [6]

From Figure 5, the largest MLE value is found. This MLE value can now be used as a better estimated parameter value than the determined probability value.

The process of finding an MLE can be done with most any parameter and is used in almost all software reliability models to determine probable values for parameters that are otherwise unknown. Now that MLE is more familiar, the next thing to discuss shall be a portion of the different groups of models, specifically, Error seeding, and the Failure Rate models.

Error Seeding

The error seeding group of models estimates the number of errors in a program by using the multistage sampling technique. This technique takes into account different errors which can be divided into inherent errors and induced errors. [7] An inherent error is one found in a program that will cause it to fail regardless of what the user does. An induced error is one intentionally inserted into a piece of software for the purpose of estimating the total number of inherent errors.

Mill’s Error Seeding Model

Mill’s error seeding model (Mills, 1970) proposed an error seeding method to estimate the number of errors in a program by introducing errors into the program. These errors were to be placed in such a way that when testing for the errors, every part of the system would have been tested for all errors, inherent and induced. [7] Basically, a group of developers/testers come up with different errors they believe could be found in the system and place them throughout the system, noting all induced errors. They would then proceed to test the whole software system for all errors using these induced errors as mandatory checking points. Throughout this testing process, not only would the induced errors be found, but many other inherent errors would also be caught during the testing process. Every error found after each test would then be removed fully from the program. From the debugging data, which consist of inherent errors and induced errors, the unknown number of inherent errors could be estimated. [7]

If both inherent errors and induced errors are equally likely to be detected, then the probability of k induced errors in r removed errors follows a hypergeometric distribution which is given by

Pk;N,n1,r=n1kNr-kN+n1r, k=1,2,…,r

Figure 6: Hypergeometric distribution used to find probability of k induced errors [7]

where

N = total number of inherent errors

n1 = total number of induced errors

r = total number of errors removed during debugging

k = total number of induced errors in r removed errors

r – k = total number of inherent errors in r removed errors [7]

Using this distribution of Mills and the fact that n1, r, and k are all known variables, another simplified equation can be found from using this.

N0=n1(r-k)k- 1

Figure 7: Simplified equation to determine total # of inherent errors [7]

This new simplified equation allows for estimation of the total number of inherent errors which can give a good understanding of the level of reliability of the specific software being tested.

Mills research and constructed model of estimating reliability can be very useful to determine the amount of errors that could be plaguing a system after testing. There are some problems that come about from using this method though. First of all, this model of finding errors is used during the testing process, which is always an expensive part of the software development process if changes need to be made. Another criticism that has shown itself to this method is the method’s inability to determine the exact type, location, and difficulty level that the induced errors should have such that they are equally likely to be found as the inherent errors. [7]

Failure rate


First off, some terms need to be defined to fully understand this model. Failure rate is defined as the frequency with which an engineered system or component fails. A failure occurs when the user perceives that the program ceases to deliver the expected service. A fault is the cause of the failure or the internal error of a program. [3] This is one of the earliest models proposed when trying to determine software reliability. The basic premise of testing failure rates per fault found is that successive failure rates will get longer as faults are removed from the software system. But this isn’t necessarily true because failure times are random. You can never determine exactly when a failure will occur in a system. Not only are failure times random, but observed values can be subject to statistical change. [9] These variables make it tough to model, but not impossible. One of the most commonly used models for failure rates per fault is the Jelinski and Moranda (JM) De-Eutrophication Model. [9]