Review Binomial
1. Use binomial when you are looking at a response that has only one of two possibilities (i.e. success/failure). The probability of success is called, p, and is the same for each observation
2. There are a fixed number of n observations and each observation is independent
Formula:
where
Example: Twenty percent of American households own three or more motor vehicles. You choose 12 households at random.
a. What is the probability that none of the chosen households owns three or more vehicles? What is the probability that at least one household owns three or more vehicles?
b. What is the probability that between one and three (inclusive) of the chosen households own three or more vehicles?
c. What are the mean and standard deviation of the number of households in your sample that own three or more vehicles?
Solution:
X~B(n,p); X~B(12, 0.20)
i. P(X = 0)
ii. P(X≥1) = 1 - P(X = 0) =
1 – 0.0689 = 0.9313
b.P(X=1)+ P(X=2)+ P(X=3)
c.
Module IV Introduction to Statistical Inference
Unit 7: Confidence Interval for a population Mean
Statistical Inference:
§ provides methods for drawing conclusions about a population from sample data
§ tells us how much we trust the conclusion
§ requires data produced through a random sample or a randomized experiment.
§ The Law of Large Numbers tells us that the sample mean from a large SRS will be close to the unknown population mean . This is why we use to estimate the mean of the population – we figure it will be close to .
How would the sample mean vary if we took many sample of size n from the same pop’n?.
§ CLT says that the mean will have a distribution close to normal with mean and standard deviation . Therefore, if we know (let’s assume we do) we can find the standard deviation
=
Statistical Confidence
Recall the 68-95-99.7% rule. It says that in 95% of all samples, the mean of the sample will be within 2 standard deviations of the population mean . So, now the mean will be within . If is within of the unknown , then this means that is within of , in 95% of all samples
So, in 95% of all samples, the unknown lies between
. and
This interval is known as a Confidence Interval (CI) for . It is a 95% CI because it contains the unknown mean in 95% of all possible samples
A level C Confidence Interval for a parameter has 2 parts:
1. An interval calculated from the data, usually of the form:
estimatemargin of error
where estimate ( in this case) is our guess of the value of the unknown parameter. The margin of error is it shows us how accurate we believe our guess to be based on the variability of the estimate.
1. A Confidence level, C which gives the probability that the interval will capture the true parameter valued in repeated samples. You choose C, usually it is 0.90, 0.95, or 0.99
Confidence Intervals for the Mean
If we know , we can standardize it to get the one sample z statistic.
Z has a N(0,1) since is normally distributed. To find a level C, mark the central area C under the normal curve.
Let z* be the point on the standard normal distribution that contains the centre and c.
This is a level c confidence interval for .
The value z* is called the critical values.
Lets try another example find the critical value Z* for a 98% Confidence Interval? Z*=2.32
Conf. Level Tail Area z*
90% 0.05 1.645
95% 0.025 1.96
99% 0.005 2.576
Confidence Interval for a pop’n mean
Draw a SRS of size n from a population having unknown mean and known σ. A level c confidence interval for μ is
The interval is exact when the population distribution is normal and is approximately correct for large n in other cases.
Example: ,
95% CI
(45.4, 50.6)
Confidence Interval Behaviour
Let’s take a closer look at the margin of error
.
It is composed of 3 parts z*, σ, and .
What happens as each of these change?
§ As z* gets smaller, the margin of error gets smaller and the CI narrower.
§ Also σ decreases, the CI gets narrower. With smaller variation, it is easier to pin down μ.
§ As n increases, the margin of error gets smaller (for fixed confidence level).
Example 6.6: A test for the level of potassium in the blood is not perfectly precise. Moreover, the actual level of potassium in a persons blood varies slightly from day to day. Suppose that repeated measurements for the same person on different days vary normally with =0.2.
a. Julie’s potassium level is measured once. The result is
x = 3.2. Give a 90% Confidence interval for her mean potassium level.
b. If three measurements were taken on different days and the mean result is =3.2, what is a 90% CI for Julie’s mean blood potassium level?
a. x = 3.2, n=1, =3.2,
= 3.2
=(2.9, 3.5)
b. n = 3
=3.2
= 3.2
=(3.01, 3.39)
We are 90% confident that after repeated sampling we may capture the population mean in our interval.
Choosing a Sample Size
Since the sample size has an affect on the width of the CI, it is something to be carefully considered before any sampling is done.
The margin of error is m = z*
To get the sample size corresponding to the desired m, substitute values for m, z* and σ (known) and solve for n
NOTE: ALWAYS ROUND UP
Example 6.10: To assess the accuracy of a lab. Scale a standard weight known to weigh 10 grams is weighted repeatedly. The scale readings are normally distributed with unknown mean. The standard deviation of the scale readings is known to be 0.0002 g.
- The weigh is weighed five times. The mean result is 10.0023 g. Give a 98% CI for the mean of repeated measurements of the weight.
- How many measurements must be averaged to get a margin of error of with 98% confidence.
a. n = 5, =10.0023, 98%
z* = 2.326
z*
=10.0023 (2.326)
=10.00230.00021
=(10.0021, 10.0025)
b. m = 0.0001
=
n = 21.64
NOTE: SAMPLE SIZE DETERMINES MARGIN OF ERROR. POPULATION SIZE DOES NOT INFLUENCE THE SAMPLE SIZE WE NEED.
Cautions about the CI formula:
§ Data must be a SRS from the population.
§ Do not use any design other than the SRS
§ Outliers can have a large effect on the CI. Beware of outliers before beginning any analysis, and try to correct or remove them before proceeding
§ If n is small and the population is not normal the confidence level will not be the stated C
§ A 95% confidence level does not mean that there is a 95% chance that is contained in the specific interval it means that 95% describes that there is chance of capturing in a long set of samples.
§ We double the margin of error when we reduce the sample size to one fourth of the original
§ If your margin of error is too large you can reduce it by:
o Using a lower level of confidence
o Increase the sample size
o Reduce your standard deviation
§ The square root in the formula implies that we must multiply the number of observations by 4 in order to cut the margin of error in half
§ The size of the population (as long as the population is much larger than the sample) does not influence the sample size we need.
20