Confidence Intervals
If we are dealing with a distribution with an unknown mean , then we could estimate by taking a random sample from this distribution and compute the sample mean. We know that the sample mean is an unbiased estimator of . However, a simple “point” estimate is worthless unless there is some notion of the precision involved.
For example: If you weigh and object and it says “105.6” grams, then that is worthless if the next measurement is “234.6” grams. In this case the variability is so high, a point estimate such as 105.6 gives little information.
We learned before that if the distribution standard deviation is , then the sample mean will have low variability as long as the sample size n is large. In fact, we know:
Uncertainty of = standard deviation of =
Therefore if we average a bunch of measurements (n large), then the average has a very good chance of being “close” to the true mean .
The interesting part comes from the CLT (central limit theorem). This will tell us how close the average is to , and how often!
Recall that the CLT says that as long as n is large ( n > 30 ) then is approximately distributed.
In fact from the 68-95-99.7 rule:
95% of the time, will be within 2 standard deviations of !
Actually 2, was rounded from the more precise figure of 1.96. (You can check the normal table to verify this.)
95% of the time, will be within 1.96 standard deviations of !
This means:
95% of the time, will be within of !
Therefore, the following method of “locating” is 95% reliable:
1. Take a large random sample (Large enough so the CLT applies) and compute the sample mean .
2. Compute the margin of error . ( If is unknown, as would often be the case, use the sample standard deviation s instead.)
3. The true mean is asserted to be within the interval
The interval computed in step 3 is called a 95% confidence interval for .
Observe that the previous steps was said to be “95% reliable”. This means the interval computed will actually contain the true mean for 95% of all possible random samples. This also means that for 5% of the time, the computed confidence interval actually fails to “capture” . This is the meaning of 95% confidence.
Think about this: The METHOD is 95% reliable. The METHOD will produce a “good” interval 95% of the time.
However, once you choose a sample, there is no way of being sure whether or not you have produced a good interval.
Once a confidence interval is computed, the notion of probability does not apply to this individual interval. This is due to the fact that the interval, once computed, is not random. Neither is the true mean. And probability only addresses random phenomena.
Example: One wishes to determine the true mean diameter of a certain type of cylinder. A researchers follows the method above and computes a 95% confidence interval of (2.431, 2.592) centimeters.
True or False: The probability that the true diameter is within (2.431, 2.592) is 95%.
False! The notion of confidence does not pertain to a interval once the interval is computed. Think about it: Either the interval contains the true value, or it does not. The interval itself is not random, and neither is the true mean diameter. Not only that, but if an independent researcher computed (2.612, 2.771) as a 95% confidence interval, how could they both contain the true mean with 95% probability??
Example: One wishes to determine the true mean diameter of a certain type of cylinder. A researchers decides to follow the method above and compute a 95% confidence interval.
True or False: The probability that the method he follows produces a “good” interval is 95%.
True! Note here we are talking about the method, not a realized interval.
Other levels of confidence besides 95%
There is no real reason to have to stick with 95% confidence. We may wish to adjust the reliability (confidence) either up or down. The method stays almost the same, with just one modification.
A 99% confidence interval for the mean is
A 95% confidence interval for the mean is
A 90% confidence interval for the mean is
In general, a confidence interval for the mean is
where is the percentile of the standard normal. In general, the following are true for the critical value :
You can always make a CI. But when is it valid?
The assumptions for this to be a valid confidence interval are:
a) Either the sample size be large enough so that the Central Limit Theorem is in effect.
-or-
b) If the sample size is small, the sampling distribution must be normally distributed.
What to do if is unknown
Replace the standard deviation with the sample standard deviation s, where , so that the confidence interval has the form:
Caution: s yields a precise estimate of the standard deviation only if the sample size is large. Thus the confidence intervals of the above form are valid only if n is large (at least as large as what is needed for the central limit theorem)
Confidence intervals based on s for smaller sample sizes will be treated in a later lecture.
Observations:
The midpoint of the confidence interval is the sample mean
The width of the confidence interval (the margin of error) is determined by three quantities:
1. The standard deviation
2. The specified level of confidence: 100()%
(which in turn gives the correct and the critical value )
3. The sample size
The width of the confidence interval
1. Increases as increases
2. Increases as the confidence level 100()% increases (because then the decreases)
3. Decreases as n increases
Note that the smaller the width of the CI, the greater the precision.
The precision of the confidence interval
4. Decreases as increases
5. Decreases as the confidence level 100()% increases (because then the decreases)
6. Increases as n increases
Example: Cereal boxes are manufactured to be 16 ounces. Fluctuations of weight around the mean are to be expected, but a consumer advocate group is interested in estimating the true mean weight. A random sample of 40 cereal boxes is selected and the sample mean weight is 15.85 ounces. Assume the standard deviation of the weights is known to be 0.4 ounces.
a) Compute a 90% CI for the true mean weight
b) Compute a 99% CI for the true mean weight
c) corresponds to what level of confidence?
d) Compute the sample size required so that a 99% confidence interval would have margin or error (1/2 the width) less than or equal to 0.01
Solution:
a)
b)
c) .05 = = This implies that . According to the normal tables, .7906 is the 81st percentile. Therefore, the area of the right tail is 19%. So there are of the two tails is 38% ( doubling 19%) . So the confidence level is 100% - 38% =62%
d) At 99%, the margin of error is Note n is the only value to be determined. Accordingly, we need . Solving this for n we get .
One Sided Confidence Intervals
In some instances, we only need to estimate an upper bound (or lower bound) confidence level for the true mean.
In general, a upper confidence interval (lower confidence bound) for the mean is
And a lower confidence interval (upper confidence bound) for the mean is
The same assumptions and interpretations of the two sided confidence intervals hold for the one sided intervals as well.
Notice the critical values for the one sided intervals have the form , but for the two sided intervals have the form
Example: A quality assurance engineer wishes to determine the true mean melting point of a type of composite material. He samples n = 100 identical specimens and determines the sample mean melting point to be 579 degrees Fahrenheit. Assume the standard deviation of melting points is actually 21 degrees Fahrenheit. The engineer is only concerned if the material has a low melting point. If the melting point is high, this will not pose a threat. Therefore, he determines a 95% upper confidence interval to be
Thus, he is 95% confident that the true mean melting point is above 574.9 degrees Fahrenheit . (That is, his method is 95% reliable)