37
Introduction
This paper documents the results of a large number of Monte Carlo simulations exploring the behavior of different variable data statistical control charts. We were performing experiments to determine if various variable data statistical control charts performed significantly differently against three common distributions.
This paper consists of five sections and an appendix. This first section, the introduction, provides an overview of the techniques used during this project and also gives the reader the overall format of this paper. The next section contains the purpose and gives the goals of this effort. The third section discusses the method by which the project was performed and the results generated. The section after that presents this effort’s results. The fifth section contains the conclusions drawn from those results. The appendix contains listings of the software tools used to generate the results.
This introduction discusses the general use of statistical control charts for variable data, details the mean and dispersion charts we explored in this project, gives examples of the three data distributions we worked with and explains the use of operating characteristic charts to compare the power of different control charts. The next paragraph gives a brief discussion of the general use of control charts.
The charts
There are many different types of charts used in statistical process control, such as X-bar, Range, Standard Deviation, Variance, XmR, u, c, and p. Each of these charts is useful for monitoring a specific type of data. In general, data can be categorized in two ways. Variable data comes from physical measurements that have a continuous range, such as length or weight. Attribute data is usually binary, (such as pass/fail,) a count, (such as number of defects,) or a rate, (such as number of defects per inspected item). This paper focuses on the statistical process control charts that handle variable rather than attribute data.
There are two general categories of variable data charts. These charts focus on a measure of the underlying process’s central tendency or a measure of the underlying process’s dispersion. Both types will be discussed in the following paragraphs.
Central tendency charts
First, we will discuss the charts that handle the central tendency of the process. While there are several possible measures of central tendency: mean, median, and mode, the statistical process community uses the mean in almost all cases.
The X-Bar chart is the chart that works with the mean of a given process’s variable data. The simplest X-Bar chart consists of a centerline and two limit lines. The centerline denotes the computed average of the measurements being controlled (more on this later). The area between the two limit lines is where most of the process’ measurements should fall. In the typical statistical process control application, one would expect 99.73% (three standard deviations) of the process’s measurements to be within the limit lines. A sample X-Bar chart is shown on the next page. The green line in the center is the estimate of the process mean. The upper and lower red lines are the upper and lower control limits. In this example a single point, number 44, is out of control.
In order to compute the proper values for the limit lines, we need the standard deviation of the process’s measurements. Since there is no way to determine the population standard deviation without exhaustively measuring the entire population of the process under examination, we need a good estimate for the population standard deviation.
The statistical process community uses two different methods of obtaining this estimate. The first is based on the range of the process measurements. The second is based on the standard deviation of the process measurements. Obtaining the estimate from the range is operationally simple, but the accuracy of the estimate suffers if large numbers of process samples are taken. Estimates based on the standard deviation are somewhat more difficult to compute, but become more and more accurate as more samples are taken.
This paper addresses X-Bar charts made by using the sample standard deviation to approximate the population standard deviation. While exploring both methods was technically feasible, resource limitations forced the experimenter to choose a single method. (During the presentation Professor Tsao specifically asked about the relative performance of the X-Bar chart based on range verses the X-Bar chart based on the standard deviation, so a single chart was created and is included as the second appendix.)
Dispersion charts
Next, we discuss the charts that handle the dispersion of the process. Again, there are several possible measures of the dispersion: range, standard deviation and variance; the statistical process community uses the range and standard deviation in almost all cases.
This project explored control charts based on all three measures of dispersion. Charts using the range are commonly used because they are simple to construct and work well when small sample sizes are used. Standard deviation charts are a bit more complex to construct but are thought to have better performance with large sample sizes. Standard deviation charts also handle variable sample sizes. We also used control charts based on the variance. Variance control charts are a bit more complex to construct than standard deviation charts, but it was thought that they would perform better against non-normal distributions.
An example s chart is shown on the next page. In this example the process’s average standard deviation is given in green. The upper control limit is given in red. In this instance, the lower control limit is zero, so it is the horizontal axis. This example does not have any points out of control.
The distributions
In the perfect world, measuring the output of a process would yield exactly the same number every time. The product would be precisely the correct size and there would be no noise in the measurement. Unfortunately, many natural and artificial events occur during a process. These events blur the exact size and precise measurement we would like to have. When one looks at a large number of these blurred measurements, an underlying structure becomes clear. These underlying structures, determined by the gross effects of these events, are probability distributions.
One nice thing about a process’s underlying probability distribution is that it strongly reflects the process’s overall environment. A sharp change in the environment will change the measurements and these measurements in turn, will form a different distribution. For example, if a given process is running well and only subject to natural variations, its underlying distribution will be a normal distribution with a characteristic mean and standard deviation. However, if some part of the process slowly wears out, it is very likely that the mean of the underlying distribution will change. So by monitoring characteristics of a process’s underlying distribution, one can detect changes in the process itself.
This project focused on three probability distributions, the normal, the uniform and the exponential distributions. The most common distribution is the normal distribution. The normal distribution occurs when there are a large number of unknown events occurring, each of which has a small overall impact on the process. The normal distribution has a well-known bell shape. The uniform distribution is flat and occurs when there is a single event that has equal chances of being at any one of a number of states. The roll of a single die with an equal chance of being one through six is a good example of a uniform distribution. The exponential distribution occurs when there are a large number of potential events, each with a equal likelihood of happening, but each event happens independently of the others.
Examples of these distributions are shown in the next three pages.
37
The Normal Distribution
The graph on the left is a histogram of 10000 samples of data drawn from a normal distribution with a mean of ten and a standard deviation of one. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is also given in red. Both charts are on the same scale. They illustrate one behavior of the central limit theorem. As the number of grouped measurements drawn from a given distribution increase, the resulting standard deviation decreases.
The Uniform Distribution
The graph on the left is a histogram of 10000 samples taken from a uniform distribution with a mean of 10 and a standard deviation of 1. The uniform distribution is “flat.” Every value between the minimum and maximum has an equal chance of occurring. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red. Both charts are on the same horizontal scale, but they have different vertical scales. These graphs illustrate the main behavior of the central limit theorem. While the samples are taken from a flat distribution, as they are grouped together, the resulting distribution approaches a normal distribution. This implies that when using large sample sizes all underlying distributions can be treated as normal. While this makes intuitive sense with relatively benign, symmetric distributions like the normal and the uniform distributions, the next graphs show similar results using the highly non-symmetric exponential distribution.
Exponential
The graph on the left is a histogram of 10000 samples taken from an exponential distribution with a mean of 10 and a standard deviation of one. The graph on the right is a histogram of the mean of 10000 samples of size ten drawn from the same distribution. A most likely estimate of a normal distribution is given in red. Both charts are on the same scale. The exponential distribution is highly non-symmetric with both the median and mode much lower than the mean. Intuitively there is nothing in the left histogram that would indicate a “bump” forming around the mean of ten. However, the distribution generated by taking groups of ten samples is becoming much more symmetric with a decided peak around ten.
37
Operating Characteristics Charts
Since part of this project was to compare the “power” or “performance” of the various control charts under different circumstances, we needed some way to measure their characteristics. We choose to do this by comparing the different charts’ operating characteristics.
An operating characteristic chart shows how likely a given method is to miss a valid change in process as a function of how big a change was made. For example, if a process changes ten percent and the way we are trying to detect changes watches for 99 percent changes it is not very likely that our method would detect the change. The same method would do a lot better if the random change were 100 percent. An example chart is given below:
The horizontal axis gives how large a change there is in the process. In this case the change is measured in how many standard deviations the process mean changed. The vertical axis shows how likely the chart was to miss the change. In this example there are seven lines denoting the characteristics of this chart for sample sizes of 2,3,5,10,20,30 and 50 working from right to left. Looking at the blue line (rightmost) we see that with only two samples, this control chart would need more than a two standard deviation shift before it had a 50 percent chance of detecting the change. With 50 samples (the black, leftmost, line) the same chart would need less than 8/10ths of a standard deviation change for a 50 percent chance.
Goal
The goal of this project was to explore how the different variable-data control charts perform given non-normal distributions and to compare the performance of the different dispersion graphs.
To meet this goal first we looked at how the standard deviation-based X-Bar chart handled the three different distributions discussed above as a function of sample size.
To meet the second portion of the goal, we then looked at how each of the dispersion charts handled the normal distribution as a function of sample size. Finally we looked at how each of the dispersion charts handled each of the distributions as a function of sample size.
The results of these experiments are given in the body of this report.
Procedure
This section describes how the project’s data and results were generated. We will discuss how the data and results for the mean and standard deviation shifts were obtained.
Process Mean Data
The overall data and result generation process consisted of four steps. For each sample size we would generate a set of baseline data. This step mimicked the normal collection of thirty good data points from which to establish the process’s mean upper and lower control limits. An example of the baseline data is given below:
We started with using thirty points to compute these parameters but soon changed to 100. At thirty points, “false alarm” in the baseline would significantly skew the remainder of the data. For example, the graph above shows the calculated process mean to be about 10.2. The random number generator that we used to generate this data was set for a mean of 10. Moving to 100 points corrected this skew whether or not there were “false alarm” points.
After we had computed the process mean, upper and lower control limits we would then generate 10000 points of experimental data for each process shift. Some experimenting was needed to determine which shifts would generate a reasonable OC chart. We settled on shifts of 0.1 standard deviation from 0.1 until 4.0 and then single points at 5.0 and 6.0 standard deviations. An example of a three-sigma process mean shift is given below.
Finally, at each shift we would compare each of the 10000 points of experimental data against the upper and lower control limits established for the baseline data. The operating characteristic measurement for that shift was determined by dividing the number of out of control points by 10000 and subtracting the result from one.
