Introduction to Chapter 6

6.1 – Overview

Inferential Statistics: We use sample data to make generalizations, inferences, or predictions about a population.

In chapter 6: We will use sample data to estimate the value of a population proportion and population mean.

We will also present methods for determining the sample size necessary to estimate those parameters with desired accuracies.

In chapter 7: We will use sample data to test some claims, or hypotheses, about a population.

Read the Chapter Problem on page 297.

6.2 –Estimating a Population Proportion

In this section, we are going to estimate the proportion of all adult Minnesotans who oppose the photo-cop legislation. (See Chapter Problem)

We use the sample of 829 surveyed adults and consider the sample proportion of 51% as the best point estimate of the population proportion.

Since the true population proportion probably is not exactly 51%, instead of using a single value .51, we may use a range of values (or interval) that is likely to contain the true value of the population proportion. This is called a confidence interval.

With a confidence interval is associated a degree of confidence.

The degree of confidence tells us the percentage of times that the confidence interval actually should contain the population parameter (e.g. proportion or mean), assuming that the estimation process is repeated a large number of times.

We will be working under the following three assumptions:

Assumptions

1.The sample is a simple random sample. (SRS)

2.The conditions for a binomial distribution are satisfied by the sample. That is: there is a fixed number of trials, the trials are independent, there are two categories of outcomes, and the probabilities remain constant for each trial. A “trial” would be the examination of each sample element to see which of the two possibilities it is.

3.The normal distribution can be used to approximate the distribution of sample proportions because np ≥ 5 and nq ≥ 5 are both satisfied. (q = 1 – p)

Notation for Proportions

p is the population proportion (percent of those who have the

"quality" under discussion)

(read "p-hat") is the sample proportion (percent of those in the

sample who have the "quality" under discussion)

x is the # of successes (# of those who have the "quality" under discussion) in a sample of size n.

(read "q-hat") is the percent of those in the sample who do not have

the "quality" under discussion)

Some Definitions

Point Estimate: a single value (or point) used to approximate a population

parameter.

Best Point Estimate of Population Proportion p:

Use as best point estimate

Standard Deviation of the Distribution of Sample Proportions:

An estimate that reveals how good the point estimate is, is the

confidence interval (or interval estimate): a range (or interval) of values used to estimate the true value of a population parameter.

(A confidence interval is associated with a degree of confidence which is a measure of how certain we are that our interval contains the population parameter)

Degree of Confidence (or confidence level, or confidence coefficient): the probability or the proportion of times that the confidence interval should contain the true value of the population parameter, assuming that the estimation process is repeated a large number of times.

Degree of confidence = 1 – ( is the complement of the confidence level)

The most common choices for confidence level are:

90% (=.10), 95% (=.05), 99% (=.01)

Example: Here is an example of a confidence interval based on the sample data of 829 surveyed adult Minnesotans, 51% of whom are opposed to use of the photo-cop:

The best point estimate of p is .51.

The 0.95 (or 95%) confidence interval estimate of the population proportion is

0.476 < p < 0.544

Interpreting a Confidence Interval

In our example, we are 95% confident that the interval from 0.476 to 0.544 actually does contain the true value of p. This means that if we were to select many different samples of size 829 and construct the corresponding confidence intervals, 95% of them should actually contain the value of the population proportion . (Note that the level of 95% refers to the success rate of the process being used to estimate the proportion, and it does not refer to the population proportion itself)

Note: It is incorrect to say that there is a 95% chance that the true population proportion will fall between 0.476 and 0.544. (Why? Because p is a constant, not a random variable. p has already occurred, we just don't know what it is.)

(Read more on page 302)

Critical Values:

• is the positive value separating an area of α/2 in the right tail of the standard normal distribution.

• - separates an area of α/2 in the left tail.

Example: Find the critical values for the indicated confidence levels. Use table A-2

Confidence Level / α / α /2 /
90%
95%
99%

Margin of Error (E) (or maximum error of the estimate): maximum likely difference between the observed sample proportion and the true value of the population proportion p. It is calculated by multiplying the critical value and the standard deviation of the distribution of sample proportions:

*** Do # 14, p. 312

*** Do # 16, p 312

Confidence Interval for the Population Proportion p:

where

Other ways of expressing the confidence intervals:

Round-Off Rule for Confidence Interval Estimates of p

3 significant digits

Procedure for Constructing a Confidence Interval for p

1) Verify that the assumptions are satisfied.

2) Find the critical value .

3) Evaluate the margin of error E.

4) Find

*** Do # 18, p. 312

Finding the Point Estimate and the Margin of Error from a Confidence Interval

Point estimate of p (middle of interval)

Margin of E(1/2 the length of the interval)E=

*** Do # 10, p. 312

*** Do # 6, p. 312

*** Do # 8, p. 312

Determining Sample Size:

If we have an approximate idea of what is :

If no estimate of is known:

Round-Off Rule for Sample Size, n: Use the computed size if it is a whole number. If it is not a whole number, round it up to the next higher whole number.

*** Do # 22, p. 312

*** Do # 24, p. 312

*** Do # 26, p. 313

*** Do # 28, p. 313

*** Do # 36, p. 315

Using the TI-83 to Construct Confidence Intervals for p:

STAT>TESTS choose A:1-propZInt.

6.3Estimating a Population Mean: σ Known

In this section, we will again be working with confidence intervals and sample size determination, but here our objective is to estimate a population mean, μ.

Assumptions:

The sample is a simple random sample (All samples of the same size have an equal chance of being selected.)
The value of the population standard deviation σ is known.
Either or both of these conditions is satisfied:

i) The population is normally distributed, or

ii) n > 30 (The sample has more than 30 values)

The best point estimate of the population mean is.

*** Example: For the sample of 106 body temperatures(midnight on day 2)given in Data Set 4 in Appendix B, themean is 98.20°F. This is the best point estimate of thepopulation mean μ of all body temperatures.

Again, as in the previous section, to fine-tune our estimate, we may use a

confidence interval which is a range (or interval) of values that is likely to contain the true value of the population mean.

Margin of Error (E) (or maximum error of the estimate): maximum likely difference between the observed sample mean and the true value of the population mean μ. It is calculated by multiplying the critical value and the standard deviation of the sample means:

E = z/2

*** Do # 6, p. 327*** Do #7, p. 327

Confidence Interval Estimate of the Population Mean μ (With σ known)

The two values and are called confidence interval limits

Procedure for Constructing a Confidence Interval for μ, (with Known σ)

1. Verify that the required assumptions are satisfied. (We have a simple random sample, σ is known, and either the population appears to be normally distributed or n > 30.)

2. Find the critical value.

3. Evaluate the margin of error E. (E = z/2 )

4. Then using E and the sample mean the confidence interval is:

Round-off Rule for Confidence Intervals used to Estimate μ:

a) If original data is given: use one more decimal place than original values.

b) If you are given summary statistics from a data set, use the same number of decimal places used for the sample mean.

*** Example: For the sample of 106 body temperatures (midnight on day 2) given in Data Set 4 in Appendix B, where is 98.20°F, construct the 98% confidence interval estimate of the mean body temperature for all healthy adults. Assume that the sample is a simple random sample and that it is known that σ = 0.62°F.

Interpret the Results

We are ____%__ confident that the interval from ______to ______actually does contain the true value of the population mean μ. This means that if we were to select many different samples of the same size (106) and construct the corresponding confidence intervals, in the long run ______%____ of them would actually contain the value of μ.

*** Do #12, p. 328

*** Do #22, pg. 328

Finding the Point Estimate and the Margin of Error from a Confidence Interval

(Similar to the process done with sample proportions in page 5 of notes, section 6.2)

(i.e., point estimate is middle of interval and margin of error is ½ the length of the interval)

*** Do # 17 – 20, p. 328

Determining Sample Size Required to Estimate μ

rounded up to the nearest whole number

What if σ is not known?

1) Use the range rule of thumb: σ ~ range/4

2) We often use s from a pilot test (n30)

3) Estimate the value of σ by using the results of some other study that was done earlier.

*** Do # 14, p. 328

*** Do #28, pg. 329

*** Do #30, pg. 329

Using the TI-83 to Construct a Confidence Interval for Estimating μ

STAT>TESTS choose 7:ZInterval

*** Do #24, pg. 328

6.4Estimating a Population Mean: σ Not Known

In section 6.3, we constructed confidence intervals for the mean of a population whose standard deviation σ was known. This assumption is not very realistic. The methods of this section are realistic and practical and do not include a requirement that σ is known. The usual procedure is to collect sample data, compute the statistics n,, and s, and use them to construct the confidence interval.

Assumptions

1.The sample is a simple random sample

2.Either the sample is from a normally distributed population or n > 30.

The sample mean is the best point estimate of the population mean μ

If the above two conditions are satisfied, to set find the confidence interval instead of using the normal distribution, we use the Student t Distribution

Student t Distribution

If the distribution of a population is essentially normal (approximately bell shaped), then the distribution of

is essentially a Student t distribution for all samples of size n. The student t distribution, (or t distribution), is used to find the critical values.

In a Student t Distribution, the critical valuesare found in table A-3 by locating the degrees of freedom and the area of one or two tails.

The number of degrees of freedom for a collection of sample data set is the number of sample values that can vary after certain restrictions have been imposed on all data values.

degrees of freedom = = (n – 1)(one less than the sample size)

Reading the Student t Tables (Two-tailed)

Suppose the conditions are satisfied to use the student t distribution. Use table A-3 to find the critical valuefor the given sample size and degree of confidence:

n / Degree of confidence /
20 / 95%
27 / 90%
16 / 99%

Properties of the Student t Distribution

1. Different for different sample sizes. See figure below.

2. Same general symmetric bell shape as the standard normal distribution,

but t curves are lower in the center and higher in the tails.

3. Has mean of t = 0.

4. Standard deviation varies with the sample size, but it is greater than 1.

5. As n gets larger, the Student t distribution gets closer to the standard

normaldistribution.

Conditions for Using the Student t Distribution

1. σis unknown, (if σ is known we use the methods of 6.3);

2. The parent population has a distribution that is essentially normal; or

3. If the parent population is not normally distributed, then n30.

Notes:

1. Criteria for deciding whether the population is normally distributed:

Population need not be exactly normal, but it should appear to be

somewhat symmetric with one mode and no outliers.

To asses normality, use the last graph in STAT PLOT with data list. If

close to a line without significant outliers, the distribution of the data is

normal.

2. Sample size n > 30:

This is a commonly used guideline, but sample sizes of 15 to 30 are

adequate if the population appears to have a distribution that is not

far from being normal and there are no outliers. For some population

with distributions that are extremely far from normal, the sample size

might need to be larger than 50 or even 100.

Choose the Appropriate Distribution

*** Do #2, pg. 343

*** Do #4, pg. 343

*** Do #6, pg. 343

Margin of Error for the Estimate of μ (With σ Not Known)

E = t/2 , wheret/2 has n – 1 degrees of freedom

Confidence Intervalfor the Estimate of μ (With σ Not Known)

where E = t/2

*** Do #12, pg. 343

*** Do #14, pg. 344

Using the TI-83 to Construct a Confidence Interval for Estimating μ

STAT>TESTS choose 8:TInterval

*** Do #18, pg. 344

*** Do #20, pg. 345