Lesson 7.1 - Teacher Information

Engineering for Reliability

Introduction:

Things that are designed and built are expected to meet certain performance criteria. The degree to which an object meets its designed performance criteria every time it is used is called reliability. Designing for reliability is easy to do, but it adds expense to the production cost of the object. Engineering decisions must take into account the costs involved. Other factors the engineering process must take into account when addressing reliability are: the consequences of failure, the frequency of use, the conditions under which the object is used, and the worst-case reasonably expected scenario under which the object will be expected to perform.

Statistical Concepts:

In any event, there is a probability that the outcome will be a failure.

Failure rates can be determined for items by testing.

Failure rates can be predicted for new products based on experience with similar products.

Determining failure rates

For objects that are mass-produced in large quantity, it is usually not worthwhile testing each one. Samples from the production line can be tested, and from these tests a failure rate for the entire production run can be determined.

Failure rates are expressed in either a mathematical probability that an object will fail on the next use, or as a function of time. Usually, items which are in continuous use will have failure rates expressed as Mean Time Between Failure (MTBF).

From a statistical perspective, an item has a probability of failing. (Pf, the probability of failure). This number is between zero (never fails) and one (always fails). Mathematically, . In reality, you can never achieve Pf=0, but Pf=1 is possible.

The same item’s probability of not failing (, the probability of success) for any single use or in a given period of time is one minus the probability of failing. This is because there are only two possible outcomes: either it fails or it does not.

For machines made of multiple parts that depend on each other, you can mathematically determine the overall success or failure probability.

Example: Calculate the failure rate of a car’s airbag system. In this system, a crash detector sends a signal to an actuator, and the actuator triggers the inflator. (We will not consider any other parts in this introductory example.) When tested in a lab separately, the detector has a failure rate of one in a thousand (1/1000 = .001) the actuator fails once in 2,500 tests (1/2,500 = .0004) and the inflator 1 in 750 (1/750 = .00133)

The engineer (and the consumer) is concerned with the airbag functioning when it is supposed to. If any component fails, the system as a whole has failed.

In order to correctly calculate the overall system reliability, we must look at the probability that each component will not fail and successfully pass on to the next component the action needed when a crash occurs.

This, in terms of failure rates, is

This is the answer, which means that in a crash, there is 0.00273 chance that any part of the system will fail, rendering the entire system a failure. The answer can also be expressed as a 0.273% chance, or 1 in 366 failure rate. (366 is the reciprocal of 0.00273)

Identifying Critical Components

In a system, the critical component is the one most likely to fail. It is the weakest link. It is the one with the largest Pf. If an engineer or corporate executives decides to improve a system’s reliability, the component to focus on is the critical component.

Redundancy

One strategy to improve overall system reliability is to add redundant components. Redundancy is defined as multiple components, any of which can support system operation on its own even in the event of the failure of another redundant component.

For example, your car has four tires. If any tire goes flat, the car can not be driven. Even though the four tires are identical, and they serve the same function, they are not redundant, since the failure of any one results in total failure of the system. An 18 wheel truck, on the other hand has 8 pairs of redundant tires. Except for the front tires, all the others are arranged in pairs at each axle end. For the truck, if any tire (except a front one) goes flat, the truck can still be driven. Note that all 16 tires are not completely redundant with each other. If both tires on the same side of any one axle go flat, the truck can not be driven. There are eight separate pairs of redundant tires.

Calculating failure rates for more complex systems

Engineers frequently need to achieve extremely high reliability rates in the systems they design. They accomplish this by a combination of methods such as improving individual components, adding interlocks to prevent operator error, and adding redundancy for critical components that can not be made sufficiently reliable individually. When evaluating these systems mathematically, it is beneficial to sketch out the relationship between the parts. As an example, in the airbag example of section 8.2, a component relationship sketch would look like this:

The initial evaluation of this system found it would fail to inflate the airbag once in 366 crashes. The auto company safety engineers or executives may decide that this is an unacceptably high failure rate. The designers of each component can be asked to go back and determine why their components fail and possibly redesign it to increase the individual component’s reliability. This is typical with the Product Development Lifecycle introduced in Section 1. For the purposes of this example, let us say that this is done, and only the inflator can be improved upon. Its reliability is quadrupled, and its new failure rate is 1 in 3000. Using the improved inflator, system reliability improves to 0.99826 (i.e., the failure rate is 0.00174. Use the equations to prove this to yourself.)

This may still be unacceptable. The failure rate is a little less than 1 in 500. If no more individual component improvements are possible, redundant components can be added. The critical component is now the detector, with a .001 failure rate. If two redundant detectors were used, the system sketch would look like:

The system reliability equation becomes:

This improves the overall system failure rate to 1 in 1362, which is better that 1 in a thousand. System reliability is now .99927 or 99.927% reliable.

Second example

A hospital administrator says she wants a “one in a million” chance of the emergency power system failing in the event of a power company blackout. The emergency power system consists of diesel generators and automatic circuit breakers. The failure rate of the generators is 0.001 and the circuit breakers have a Pf of 0.000025. Determine how many diesel generator sets are required.

Answer: Component relational sketch

From the sketch, with one of each component, we get the equation

This number is more than one in a million, so this single component configuration is unacceptable.

Since the hospital has no control over the individual component’s reliability, they must consider adding redundancy. Because EACH generator needs its own breaker, we can consider the combination of a generator and it’s breaker as a single component. We can easily compute the reliability of this “component” to be 0.998975 (this is 1- 0.001025).

Now the question simply becomes, how many redundant DG sets would bring reliability up to 0.999999 (this is the reliability of a “one in a million” failure rate.)

Using

, solve for “n” when

Since you can only have a whole number of generators, when n=2, Ps = 0.9999989

This is 1.1 failures in a million. Is this good enough? The administrator will have to decide. It certainly is close to the desired criteria. To purely answer the “one in a million” criteria, you would need to have three redundant generators with their breakers.

This results in Ps of 0.999999998924, which amounts to one failure in nearly a billion.

Technical note: This calculation is common in places such as this where absolute electric reliability is necessary; where lives depend on it. Where multiple generators are redundant, an additional time-delay circuit of a few seconds is placed on the second and successive redundant generators to prevent them from all starting at once and “fighting” to pick up the electric load. Thus, the second generator only starts when power fails AND the first DG set also fails, the third one only starts if both the first and second fail, etc.