Confidence Intervals for a population proportion

Suppose that you are running for election and you are interested in the proportion of eligible voters who support you. If you decide to take a random sample of eligible voters and find the sample proportion of this sample, then we already know that (approximately):

1. is unbiased for

2. The standard deviationis small if the sample size is large.

Recall that is is so large such that then is approximately normally distributed. Therefore,

is an approx CI for the true proportion . However, the margin of error depends on p, which means it cannot be calculated if p is unknown. There are two possible remedies for this:

Traditional method: Replace with and define

to be the confidence interval. This method is valid if the number of successes and the number of failures are both at least 15.

Agresti - Coull method (The plus 4 method): Define and and define

to be the confidence interval. This method is valid if the sample size is at least 10.

Notes:

  1. The Agresti-Coull method is an improvement over the traditional method.
  2. Both methods agree when the sample size is very large.
  3. The traditional method is still widely used in practice. However it performs very badly for small values of n.
  4. If a confidence bound or interval gives a value less than 0 or greater than 1 (which occasionally happens, especially if the sample size is small), then replace that value with 0 or 1, respectively.

Example: Stainless steel can be susceptible to stress corrosion cracking under certain conditions. A materials engineer is interested in determining the proportion of steel alloy failures that are due to stress corrosion cracking. In a sample of 100 failures, 20 of them were caused by stress corrosion cracking. Find a 95% CI for the proportion of failures caused by stress corrosion cracking.

and , so

Notice in the previous example the margin of error is .078. Suppose we wanted to increase the sample size so that the margin of error becomes no greater than 0.02. How do we do this? What makes this tricky is that the margin of error depends on the sample as well, and not just on the sample size. This is the subject of the following:

Computing the sample size needed for a specified margin of error

Suppose we want the margin of error to be .

Two ideas:

  1. If we have a , and it is somewhat “reliable” or “stable”, meaning that we expect that to not change much after re-computing its value on different (and larger) sample sizes, we just plug that value of into the inequality and solves for n:
  1. If we believe to be unreliable, then we do a “conservative” method by substituting for in the previous inequality:

The reasoning is this: since for is maximized by taking , we know that the margin of error, no matter what, is no greater than . Therefore, the value of n that makes this small enough will also make small enough.

Example: (Continuation)

In the stress corrosion cracking example, find the sample size needed to specify the proportion within , assuming:

a)The value of computed is reliable.

b) may be unreliable, and so we need a conservative estimate of the sample size needed.

Solution:

a)

b)