1

Chapter 5 – Decision Making for Two Samples

Inference about Two Population Means

We want to compare the means of two populations to see whether they differ. There are two situations to consider, as shown in the following examples:

1) In an experiment designed to study the effects of illumination level on task performance (“Performance of Complex Tasks Under Different Levels of Illumination,” J. Illuminating Engineering, 1976: 235-242), subjects were required to insert a fine-tipped probe into the eyeholes of ten needles in rapid succession both for a low-light-level with a black background and for a higher level with a white background. It is of interest to compare the mean times for completion of the task under the two different conditions.

2) Compare the mean lifetime, 1, for transistors produced by production line 1 to the mean lifetime, 2, for transistors produced by production line 2. We want to know whether these two means differ.

In the first case, we are comparing related means, using dependent samples. For each member of one sample, there is a matched member of the other sample.

In the second case, we are comparing unrelated means, using independent samples. There is no natural way to match each member of one sample with a member of the other sample.

We will use somewhat different procedures for hypothesis tests, depending on whether our samples are dependent or independent.

There is another issue to be considered. Are the variances of the two populations equal or unequal. This issue, of course, did not arise with inference about a single population. We will see that the procedure for inference about the difference between the means depends on the comparison of the variances.

Comparing Two Means, Independent Samples

We will assume the following:

1)We have selected a random sample from each of the two populations. The r.s., of size n1, from population 1 will be denoted by . These r.v.’s are assumed to be i.i.d. with mean and variance . The r.s., of size n2, from population 2 will be denoted by . These r.v.’s are assumed to be i.i.d. with mean and variance .

2)The two populations are independent. This implies that all of the r.v.’s listed above are independent of each other.

3)Either both populations are normal, or the conditions of the Central Limit Theorem apply. (We may also check for normality of each population using normal probability plots with the samples of data.)

We want to estimate the difference, , between the population means. A logical point estimator of this parameter is . It is easily shown that this statistic is an unbiased estimator of the parameter. It is also easily shown that the variance of the estimator is .

Given these results and the assumptions listed above, it is clear that the random variable

has an approximate standard normal distribution. We want to use this fact to do inference about the difference between the two population means. However, the random variable given above depends on two other unknown parameters. We need to estimate the two population variances.

If we can assume equal population variances, then the following statistic: may be used to construct a confidence interval for . Here the quantity

is the pooled variance estimate.

Confidence Intervals for Differences Between Population Means

We can find confidence interval estimates for the differences between two population means (independent samples), where the two population variances are equal, using the following formula: . In this case, d.f. = n1 + n2 – 2.

Example: The accompanying table gives summary data on cube compressive strength (N/mm2) for concrete specimens made with a pulverized fuel-ash mix (“A study of twenty-five-year-old pulverized fuel ash concrete used in foundation structures,” Proceedings of the Institute of Civil Engineers,” Mar. 1985, 149-165). We want to estimate the difference, , in mean compressive strengths, with 95% confidence, and interpret this interval estimate.

Age (days) / Sample Size / Sample Mean / SampleSD
7 / 68 / 26.99 / 4.89
28 / 74 / 35.76 / 6.43

Since the sample standard deviations do not differ considerably, we assume that the population variances are equal. The pooled variance estimate is

. The critical value is

. Then the endpoints of the interval are

, and

.

We are 95% confident that the difference between mean compressive strength after 7 days of curing and the mean compressive strength after 28 days of curing is between -10.6780 N/mm2 and -6.8620 N/mm2. In particular, we have a high level of confidence that the mean compressive 28-day strength is higher than the mean compressive 7-day strength.