7
STAT 211
Handout 7
(Chapter 7: Statistical Intervals based on a Single Sample)
A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic.
The best statistic (MVUE) is the unbiased statistic with the smallest standard deviation.
A confidence interval for a population characteristic (parameter) is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval. The confidence level, 1-a, associated with a confidence interval estimate the success rate of the method used to construct the interval.
If we repeatedly sample from a population and calculate a confidence interval each time with the data available, then over the long run the proportion of the confidence intervals that actually contain the true value of the population characteristic will be 100(1-a)% (95%, 90%, or 99% for a=0.05, 0.10, or 0.01, respectively).
The general form of a confidence interval:
(point estimate for a specified statistic) ± (critical value).(standard error for the point estimate).
What is the best estimator for parameters, m, s2, p? ______
Empirical Rule tells you about 95% of all our values for will be within 1.96 standard deviation from the mean.
What is 1-a when you compute 95% confidence interval? ______
What is a when you compute 95% confidence interval? ______
What is when you compute 95% confidence interval? ______
Confidence Interval for a Population Mean, m
Suppose that the parameter of interest is the population mean, m and that
a. the population distribution is normal
b. the value of the population standard deviation s is known
Let X1, X2, ....,Xn be a random sample. Then 100(1-a)% confidence interval for m is where
Thus, in 95% of all possible samples, m will be captured in the following calculated confidence interval:
Choosing the sample size: Bound on the error estimation is . I mean will be within of m. The sample size required to estimate a population mean m to within an amount B= with 100(1-a)% confidence is n=. The same formula can be written using the interval width, w= then n=.
Example 1:Each of the following is a confidence interval for true average amount of time spent by the patients using physical therapy device using the sample data: (10.90, 25.44), (13.58, 22.76)
(a) What is the value of the sample mean time spent by the patients using physical therapy device?
(b) The confidence level for one of these intervals is 95% and for the other is 99%. Which of the intervals has the 95% confidence level and why?
Example 2: Suppose we want to estimate the average # of violent acts on TV per hour for a specific network. Data was collected from viewing random selection of 50 prime time hours and average of 11.7 violent acts were recorded. Suppose it is known that s=5.
The 95% CI for m is (10.3141 , 13.0859)
The 95% confidence interval for m if 100 prime time hours had been viewed where the same mean and the variance obtained is (10.72 , 12.68)
The 90% CI for m is (10.5368 , 12.8632)
The width of the 90% confidence interval for m is 2.3264
The bound on the error estimation of the 90% confidence interval for m is 1.1632
Example 3: Investigators would like to estimate the average taxable income of apartment dwellers to within $500, using a 95% CI, Suppose that the previous studies show that standard deviation is $8000. How many people should they study? (Answer: 984)
Large Sample Confidence Interval for m
Suppose that the parameter of interest is the population mean, m and that
a. X1, X2, ...,Xn is a random sample from a population distribution with mean, m and standard deviation, s.
b. For the large sample size n, the CLT implies that has approximately a normal distribution for any population distribution.
c. The value of the population standard deviation s may not be known. Instead, the value of the sample standard deviation s may be known.
If n is sufficiently large (n>40), 100(1-a)% large sample confidence interval for m is where
Example 4: One method for solving the electric power shortage employs the construction of floating nuclear power plants located a few miles offshore in the ocean. Because there is concern about the possibility of a ship collision with the floating, an estimate of the density of ship traffic in the area is needed. The number of ships passing within 10 miles of the proposed power-plant location per day recorded for 60 days during July and August, possessed sample mean and variance, 7.2 and 8.8, respectively.
(a) Find a 98% confidence interval for the mean number of ships passing within 10 miles of the proposed power-plant location during any day time period. (Answer:(6.3077,8.0923))
(b) Consider the possibility that ± 1 ship in precision of estimation are desired in 98 % confidence interval for the mean number of ships passing within 10 miles of the proposed power-plant location during a any day time period, what should be the sample size of ships observed? (Answer:48)
Example 5: I want to see how long on average, it takes Drano to unclog a sink. In a recent commercial, the stated claim was that it takes on average, 15 minutes. I wanted to see if that claim was true, so I tested Drano on 64 randomly selected sinks. I found that it took an average of 18 minutes with standard deviation of 2.5 minutes. Was their claim false?
99% CI for m is (17.1953 , 18.8047)
90% CI for m is (17.4859 , 18.5141)
What is different in one-sided confidence intervals? Discussion
Example 6: Determine the confidence level for each of the following large sample one-sided confidence bounds:
(a) Upper bound: (Answer: 0.8238)
(b) Lower bound: (Answer: 0.9599)
A General Large Sample Confidence Interval
When the estimator satisfies the following properties,
a. The estimator has approximately a normal population distribution
b. It is at least unbiased
c. standard deviation of the estimator is known
The confidence interval for q can be constructed as where
Example 7: large sample confidence interval for the parameter l in Poisson distribution is where
Large Sample Confidence Interval for a population proportion, p
If n is sufficiently large, 100(1-a)% large sample confidence interval for p is where
Check if and to see if you have a large sample. Otherwise, there is a formula (7.10) in your textbook, which can be used without checking if it is a large sample. I mean formula (7.10) can be used for large and small samples.
Choosing the sample size: Bound on the error estimation is . I mean will be within of p. The sample size required to estimate a population proportion p to within an amount B= with 100(1-a)% confidence is n=. The same formula can be written using the interval width, w= then n=.
The conservative sample size can be found when ==0.5
What is different in one-sided confidence intervals? Discussion
Example 8: We are interested in proportion of all students enrolled in Stat211 who listen to country music. Using our class as our random sample from Stat211 students, we see that ______out of ______of you listen to country music. Estimate the true proportion of all Stat211 students that listen to country music using 90% confidence interval.
What parameter are we estimating?______
Example 9:Scripps News service reported that 4% of the members of the American Bar Association (ABA) are African American. Suppose that this figure is based on a random sample of 400 ABA members.
(a) Is the sample size large enough to justify the use of the large-sample confidence interval for a population proportion?
(b) Construct and interpret a 90% confidence interval for the true proportion of all ABA members who are African American. (Answer: (0.0239 , 0.0561))
Example 10: I want to estimate the proportion of freshmen Aggies who will drop out before graduation. How many Aggies should I include in my study in order to estimate p within 0.05 with 95% confidence? (Answer: 385)
Intervals based on a Normal Population Distribution:
When the sample size is small, we have to make specific assumptions to find the confidence intervals.
Assumption: The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a normal distribution with both m and s unknown.
When the sample mean of a random sample of size n from a normal distribution with mean m, and the standard deviation s, the random variable has a probability distribution called a t-distribution with n-1 degrees of freedom.
Properties of t-distribution: discussion (page 296 of your textbook) and t-distribution table is on page 725, Table A.5.
100(1-a)% confidence interval for m is where
Example 11: Students weighed in kilograms at the beginning and end of a semester long fitness class. Assume the population of weight changes follows a normal distribution. A random sample of 12 female students yielded a mean of 0.45 and standard deviation of 1.5.
99% CI to estimate the true mean weight change is (-0.8949 , 1.7949).
Would you believe me if I claimed the average weight change was 0?
What is different in one-sided confidence intervals? Discussion
A Prediction Interval for a Single Future Value:
Let X1, X2, ...,Xn be a random sample from a normal population distribution and we wish to predict the value of Xn+1, a single future observation. 100(1-a)% prediction interval for Xn+1 is where
Example 12: What is the 99% prediction interval for the weight of an individual student from the population distribution in example 11? (Answer: (-4.3992 , 5.2992))
Tolerance Intervals: Let k be a number between 0 and 100. A tolerance interval for capturing at least k% of the values in a normal distribution with a confidence level 100(1-a)% has the form
Table A.6 (page 726) is designed for the tolerance critical values where k=90, 95, 99 and a=0.05 ,0.01 in one and two-sided intervals.
Example 13: Use example 11 and calculate an interval that includes at least 95% of the student weights in the population distribution using a confidence level of 99%. (Answer: (-5.355 , 6.255))
Confidence Intervals for the Variance, s2 and Standard Deviation, s of a Normal Population :
The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a normal distribution with parameters m and s2. Then the random variable
has a chi-squared () probability distribution with n-1 degrees of freedom. 100(1-a)% confidence interval for s2 is where .
The details of the chi-squared () probability distribution will be discussed in class and the table of critical values (Table A.7, Page 727) will be demonstrated.
Example 14: Determine the following:
(a) The 95th percentile for the chi-squared distribution with n=20.
(b) The 5th percentile for the chi-squared distribution with n=20.
(c) P(10.117££ 30.143) where is a chi-squared r.v. with n=20.
(d) P(<10.283 or >35.478) where is a chi-squared r.v. with n=22.
Exercise 7.44:
(a) Is it plausible to assume that the data come from a normal population distribution?
(b) Calculate an upper bound with the confidence level 95% CI for the population standard deviation of turbidity.
(c) Calculate a 95% CI for the population standard deviation of turbidity.
Variable n Mean Median TrMean StDev SE Mean
turbidity 15 25.313 25.800 25.438 1.579 0.408
Variable Minimum Maximum Q1 Q3
turbidity 21.700 27.300 24.100 26.700
Discussion on finding the confidence interval for the linear combination of the population means
Exercise 7. 51: 95% CI for where is the ith true average yield.
Treatment / / /1 (pesticide) / 100 / 10.5 / 1.5
2 (pesticide) / 90 / 10.0 / 1.3
3 (pesticide) / 100 / 10.1 / 1.8
4 (ladybugs) / 120 / 10.7 / 1.6