Todd A. Brahler

Brahler

Page 1 of 24

Todd A. Brahler

Computational Statistics

Final Project

One Population

Classic Technique

This analysis involved average test scores of algebra students enrolled in MTH-126 at WrightStateUniversity. The question of interest was whether or not each of the first four average test percentages differed significantly from 65, which is considered to be a grade of ‘C’ for the test. Since it is very rare that the distribution of these percentages are normal, I thought it would be a good opportunity to compare the widths of the classic and bootstrap confidence intervals to determine if the bootstrap method consistently produced narrower confidence intervals.

Figure 1 is a boxplot of the mean percentages of the four tests, and Tables 1 through 4 contain the results of the individual normality tests for the four algebra tests. The distribution of Test 2 was the only one which did not deviate significantly from a normal distribution. Table 5 contains the results of the following hypothesis test:

where μ is the mean test percentage for a particular test. At the 5% level of significance, Test 2 (t = −2.99, p = 0.0053) and Test 3 (t = −4.47, p < .0001) were the only ones in which the mean percentage scores were significantly different from the hypothesized value of 65. The mean test scores of Test 1 (t = -0.85, p = 0.4016 and Test 4 (t = -0.53, p = 0.5997) were not significantly different from the hypothesized value of 65. Table 5 also contains the 95% confidence intervals: (56.681, 68.417) for Test 1; (46.788, 61.545) for Test 2; (42.102, 56.427) for Test 3; and (54.089, 71.401) for Test 4. These intervals were consistent with the results of the four t-tests.

Bootstrap Technique

A bootstrap-t 95% confidence interval was calculated using equation (7.4)

where

The bootstrap-ttechnique was chosen over the percentile bootstrap method because the latter does not perform well when the goal is to make inferences about the population mean using the sample mean, unless the sample size is very large (Wilcox, 2003). A total of 1000 T* statistics was calculated. The resulting bootstrap-t confidence interval for (a) test 1 was (55.8119,68.025), (b) test 2 was (46.51105, 61.20337), (c) test 3 was (41.31146, 55.79562), and (d) test 4 was (52.28721, 70.27229). The intervals for Test 2 and Test 3 were the only ones that did not contain the test mean of 65, which is consistent with the results of the classic technique.

Discussion and Conclusions

Wilcox (2003) stated that hypothesis testing using the Student’s-t will suffice for random samples taken from a normal population. Populations that deviate from normality, on the other hand, can inflate the sample variance which, in turn, can impact one’s control of Type I error and actual probability coverage. Bootstrap techniquesmake no assumptions about the sampling distribution of the sample mean. Of the four tests conducted here, only one sampling distribution did not deviate significantly from a normal distribution, so we would expect the bootstrap confidence intervals to be narrower than the classic confidence intervals.

The results of the four t-tests indicate that both techniques led to the same overall conclusion. More importantly, though, it was noted that the bootstrap confidence intervals were consistently narrower than the ones produced by the classic technique. However, the difference in the confidence interval widths of the two methods did not differ substantially, which would lead us to conclude that the deviation in normality might not have been large enough to impact the accuracy of theStudent’s-T.

It is also possible that the small sample size could have played a significant role in the width of the bootstrap confidence intervals. According to Wilcox (2003), the actual probability coverage of bootstrap confidence intervals can be affected by small sample sizes. For either the bootstrap-t or bootstrap percentile method, merely increasing B does not address this problem. The sample size in this analysis was 33, which may or may not be considered “sufficiently large”. If the sample size was larger in this analysis, maybe there would have been a greater disparity in the widths of the confidence intervals. The bottom line, though, at least for these data, was that there was very little difference in the two methods.

Two Independent Samples

Comparison of Two Independent Samples: Classic Technique

Montgomery (2005) cited a study which investigated the quality of cell phone front housings. Forty housings were manufactured in an injection molding process. Using random assignment, 20 were allowed to cool for 10 seconds and the other 20 were allowed to cool for 20 seconds. Each housing was visually inspected and assigned a quality score which ranged from 1 (i.e., “completely defective”) and 10 (i.e., “no defects”). The question of interest is whether or not a significant difference exists in the number of defects of the two groups of housings.

Figure 2 is a boxplot of the two groups of cell phone housings. It appears that the mean rating of the 20-second group (6.50) is larger than that of the 10-second group (3.35). It also appears that the distribution of the 10-second group deviates from a normal distribution. Tables 6 and 7contain the results for the tests of normality. They indicate that the distribution of the housings cooled for 10 seconds deviated significantly from a normal distribution, but the 20-second group did not deviate significantly from a normal distribution. The test for equality of variances (see Table 8) indicated that there was no significant difference in the variances of the two samples.

For this analysis, the hypotheses of interest was

where μ1 is the mean rating ofthe housings cooled for 10 seconds and μ2 is the mean rating ofthe housings cooled for 20 seconds. The results of the hypothesis test are summarized in Table 8. At the 5% level of significance, we can reject the null hypothesis and conclude that there is a significant difference in the mean rating for the two groups of housings(t = −5.57, p< .0001). Table 9 contains descriptive statistics and the 95% confidence interval (−4.295, −2.005), which suggests that the cell phone housings cooled for 20 seconds resulted in significantly fewer defects than the cell phone housing that were allowed to cool for 10 seconds.

Bootstrap Analysis: Two Independent Samples

These data were analyzed using equation (8.17) cited by Wilcox (2003)

where

The resulting 95% confidence interval was (−4.24, −1.82), so we can conclude that there is a significant difference in the mean quality ratings of the two groups. This coincides with the results of the classic technique.

Discussion and Conclusions

Although both techniques produced similar results, the confidence interval from the bootstrap technique was slightly narrower than the one using the classic technique. This could be attributed to several factors. First, in order to use the classic technique, both groups should be randomly sampled from a normal distribution and should have equal variances. According to Montgomery (2005), the 40 observations in the experiment were run in random order. However, as stated earlier, the 10-second group deviated significantly from a normal distribution. Wilcox (2003) stated that the Student’s-t will perform reasonably well in regards to controlling Type I error—even if the distributions are nonnormal—but deviations from normality can inflate the variance which will widen the resulting confidence interval.

Wilcox (2003) also stated that major problems could occur when both samples suffer from heteroscedasticity. Despite the fact that the variance of both groups did not differ significantly, it is possible that tests for homogeneity of variance do not have enough power to detect significant differences (Wilcox, 2003). Both of these factors can impact control of Type I error, power, and probability coverage. In short, the more the distributions differ, the less accurate is the Student’s-t confidence interval (Wilcox, 2003).

One-Way Analysis of Variance

Classic One-Way ANOVA

Montgomery (2005) described a scenario in which a golfer believed that he played his best golf during the summer (i.e., June through September) and “shoulder” seasons (i.e., October, April, and May) and his worst golf during the winter (i.e., November through March). Golf scores for 18-hole rounds were collected for all three seasons. A one-way analysis of variance was conducted to test his hypothesis.

Table 10 contains descriptive statistics of the mean golf score, standard deviation, and number of observations for each season. Figure 3 is a boxplot of the data. The data look reasonably normally distributed and seem to indicate that the higher mean score occurred during the winter season (89.125). The mean scores for summer (87.100) and the shoulder season (86.143) are very similar. The results of the normality tests (see Tables11, 12, and 13) indicate that the data for each group do not deviate significantly from a normal distribution. The results of the Levene Test for Homogeneity of Variance (see Table 14) indicate that the variances of the three groups are not significantly different.

The null and alternative hypotheses for this problem is

where μ1is the mean golf score for the summer season, μ2 is the mean golf score for the shoulder season, and μ3 is the mean golf score for the winter season. Table 15 contains the results of the one-way ANOVA. At the 5% level of significance, we fail to reject the null hypothesis (F = 2.12, p = 0.1437) and can conclude that there is no significant differences in the mean golf scores for the three seasons. The data do not support the golfer’s claim that he plays better golf in the summer and shoulder months.

Bootstrap Technique

These data were analyzed using the S+ function “t1waybt” using 0% trim and B = 599 (see Table 16). The results indicate that we fail to reject the null hypothesis (F = 1.87, p = 0.17) at the 5% level of significance. This suggests once again that there is no significant difference in the golfer’s mean scores across all three seasons.

Discussion and Conclusion

Many of the problems encountered in the comparison of two independent groups (e.g., deviations from normality and heterogeneity of variance) come into play and can even be compounded in a one-way analysis of variance (Wilcox, 2003). However, one factor which could have played a role in the analysis of the golfer’s scores involves the sample sizes, all of which were 10 or less. Small sample sizes can impact power; power to detect significant differences inthe variances andsample means of the three groups. So in this analysis, the small sample sizes could have played a major role by creating a condition in which the power was not high enough to detect a significant difference in the mean scores.

Two-Way Analysis of Variance

Classic Technique

Montgomery (2005) described an experiment which investigated how furnace position and firing temperature affects the baked density of a carbon anode. Eighteen trials were conducted in random order using two levels of furnace position (i.e., ‘1’ and ‘2’) at three temperature levels: 800, 825, and 850 degrees Fahrenheit. There were three replicates per treatment.

On the basis of the plot of the mean carbon anode density versus temperature for both furnace positions (see Figure 4), there appears to be a slight interaction present. We can be almost certainthat there is a main effect for temperature, but it is not clear if there is a main effect for furnace position. Table 17 contains the results of the two-way ANOVA. At the 5% level of significance, the interaction between firing temperature and furnace position is not significant (F = 0.91, p = 0.4271). There is evidence of significant main effects of furnace position (F = 16.00, p = 0.0018) and firing temperature (F = 1056.12, p < .0001) at the 5% level of significance.

The plot of the residuals versus predicted values of carbon anode density (see Figure5) and the results of the normality tests (see Table 18) indicate that the residuals follow a normal distribution. There does not appear to be any serious violation of the constancy of variance assumption as well.

Bootstrap Two-Way ANOVA

These data were analyzed using the “pbad2way” function in S-PLUS. The results are contained in Table 19. At the 5% level of significance, we can conclude that there is no significant interaction (p = 0.3305), but there are significant main effects for furnace position (p = 0) and temperature (p = 0).

Discussion and Conclusion

Factors that can impact the efficiency of the F-test in the one-way analysis of variance also come into play for the two-way analysis of variance. In short, small sample sizes can affect the power of the F-test. In addition, heteroscedasticity and significant deviations from normality can influence the results of the test. However, in the analysis of the carbon anode density, both techniques led to the same conclusion.

Figure 1: Boxplots of Percentages for Tests 1 through 4

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.937703 / Pr < W / 0.0527
Kolmogorov-Smirnov / D / 0.180484 / Pr > D / <0.0100
Cramer-von Mises / W-Sq / 0.128574 / Pr > W-Sq / 0.0449
Anderson-Darling / A-Sq / 0.77718 / Pr > A-Sq / 0.0409

Table 1: Tests of Normality for Test 1

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.972109 / Pr < W / 0.5220
Kolmogorov-Smirnov / D / 0.130164 / Pr > D / 0.1481
Cramer-von Mises / W-Sq / 0.071037 / Pr > W-Sq / >0.2500
Anderson-Darling / A-Sq / 0.429244 / Pr > A-Sq / >0.2500

Table 2: Tests of Normality for Test 2

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.930862 / Pr < W / 0.0331
Kolmogorov-Smirnov / D / 0.131934 / Pr > D / 0.1361
Cramer-von Mises / W-Sq / 0.111673 / Pr > W-Sq / 0.0788
Anderson-Darling / A-Sq / 0.77656 / Pr > A-Sq / 0.0411

Table 3: Tests of Normality for Test 3

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.877031 / Pr < W / 0.0012
Kolmogorov-Smirnov / D / 0.189339 / Pr > D / <0.0100
Cramer-von Mises / W-Sq / 0.22381 / Pr > W-Sq / <0.0050
Anderson-Darling / A-Sq / 1.380971 / Pr > A-Sq / <0.0050

Table 4: Tests of Normality for Test 4

Variable / N / Lower CL
Mean / Mean / Upper CL
Mean / StdDev / StdErr
PERCENTAGE1 / 34 / 56.681 / 62.549 / 68.417 / 16.819 / 2.8844
PERCENTAGE2 / 34 / 46.788 / 54.167 / 61.545 / 21.147 / 3.6266
PERCENTAGE3 / 34 / 42.102 / 49.265 / 56.427 / 20.528 / 3.5206
PERCENTAGE4 / 34 / 54.089 / 62.745 / 71.401 / 24.809 / 4.2546
T-Tests
Variable / DF / tValue / Pr|t|
PERCENTAGE1 / 33 / -0.85 / 0.4016
PERCENTAGE2 / 33 / -2.99 / 0.0053
PERCENTAGE3 / 33 / -4.47 / <.0001
PERCENTAGE4 / 33 / -0.53 / 0.5997

Table 5: Results of T-test

Figure 2: Boxplot for Cell Phone Housings

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.903449 / Pr < W / 0.0479
Kolmogorov-Smirnov / D / 0.219213 / Pr > D / 0.0128
Cramer-von Mises / W-Sq / 0.132008 / Pr > W-Sq / 0.0394
Anderson-Darling / A-Sq / 0.747852 / Pr > A-Sq / 0.0442

Table 6: Normality Test for 10-Second Cooling Time

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.939823 / Pr < W / 0.2379
Kolmogorov-Smirnov / D / 0.13514 / Pr > D / >0.1500
Cramer-von Mises / W-Sq / 0.072556 / Pr > W-Sq / 0.2478
Anderson-Darling / A-Sq / 0.456584 / Pr > A-Sq / 0.2431

Table 7: Normality Tests 20-Second Cooling Time

Equality of Variances
Variable / Method / NumDF / DenDF / F Value / PrF
SCORE / Folded F / 19 / 19 / 1.70 / 0.2559

Table 8: Test for Equality of Variances

T-Tests
Variable / Method / Variances / DF / tValue / Pr|t|
FINAL / Pooled / Equal / 38 / -5.57 / <.0001
FINAL / Satterthwaite / Unequal / 35.6 / -5.57 / <.0001
FINAL / Cochran / Unequal / 19 / -5.57 / <.0001

Table 8: Results of T-Test for Cell Phone Housing Ratings

Variable / TIME / N / Lower CL
Mean / Mean / Upper CL
Mean / StdDev / StdErr
RATING / 10 / 20 / 2.4106 / 3.35 / 4.2894 / 2.0072 / 0.4488
RATING / 20 / 20 / 5.7797 / 6.5 / 7.2203 / 1.539 / 0.3441
RATING / Diff (1-2) / -4.295 / -3.15 / -2.005 / 1.7885 / 0.5656

Table 9: Results of T-Test for Cell Phone Housing Ratings

Analysis Variable : Score
GROUP / N Obs / N / Mean / Std Dev
SUMMER (JUNE-SEPTEMBER) / 10 / 10 / 87.100 / 2.767
SHOULDER (OCT/APRIL/MAY) / 7 / 7 / 86.143 / 2.610
WINTER (NOV-MARCH) / 8 / 8 / 89.125 / 3.271

Table 10: Descriptive Statistics of Mean Golf Scores for Three Seasons

Figure 3: Boxplots of Golf Scores by Season

Tests for Normality: Summer
Test / Statistic / p Value
Shapiro-Wilk / W / 0.939068 / Pr < W / 0.5427
Kolmogorov-Smirnov / D / 0.176068 / Pr > D / >0.1500
Cramer-von Mises / W-Sq / 0.044822 / Pr > W-Sq / >0.2500
Anderson-Darling / A-Sq / 0.289675 / Pr > A-Sq / >0.2500

Table 11: Normality Tests for Summer

Tests for Normality: Shoulder
Test / Statistic / p Value
Shapiro-Wilk / W / 0.93362 / Pr < W / 0.5821
Kolmogorov-Smirnov / D / 0.228421 / Pr > D / >0.1500
Cramer-von Mises / W-Sq / 0.041437 / Pr > W-Sq / >0.2500
Anderson-Darling / A-Sq / 0.281681 / Pr > A-Sq / >0.2500

Table 12: Normality Tests for Shoulder

Tests for Normality: Winter
Test / Statistic / p Value
Shapiro-Wilk / W / 0.913371 / Pr < W / 0.3784
Kolmogorov-Smirnov / D / 0.24207 / Pr > D / >0.1500
Cramer-von Mises / W-Sq / 0.072343 / Pr > W-Sq / 0.2332
Anderson-Darling / A-Sq / 0.392235 / Pr > A-Sq / >0.2500

Table 13: Normality Tests for Winter

Levene's Test for Homogeneity of Score Variance
ANOVA of Squared Deviations from Group Means
Source / DF / Sum of Squares / Mean Square / F Value / PrF
GROUP / 2 / 50.4154 / 25.2077 / 0.49 / 0.6201
Error / 22 / 1135.5 / 51.6135

Table 14: Test for Homogeneity of Golf Score Variances

Source / DF / Sum of Squares / Mean Square / F Value / PrF
Model / 2 / 35.6078571 / 17.8039286 / 2.12 / 0.1437
Error / 22 / 184.6321429 / 8.3923701
Corrected Total / 24 / 220.2400000

Table 15: ANOVA Results for Golf Scores

> golf2 = selby(GOLF,1,2)

> golf2$x

[[1]]:

[1] 83 85 85 87 90 88 88 84 91 90

[[2]]:

[1] 91 87 84 87 85 86 83

[[3]]:

[1] 94 91 87 85 87 91 92 86

> t1waybt(golf2$x,tr=0,nboot=599)

[1] "Taking bootstrap samples. Please wait."

[1] "Working on group 1"

[1] "Working on group 2"

[1] "Working on group 3"

$test:

[1] 1.868565

$p.value:

[1] 0.1702838

Table 16: Bootstrap Results for Golf Scores

Figure 4: Plot of Estimated Mean Density versus Temperature by Furnace Position

Source / DF / Type III SS / Mean Square / F Value / PrF
POSITION / 1 / 7160.0556 / 7160.0556 / 16.00 / 0.0018
TEMP / 2 / 945342.1111 / 472671.0556 / 1056.12 / <.0001
POSITION*TEMP / 2 / 818.1111 / 409.0556 / 0.91 / 0.4271

Table 17: Results of ANOVA for Carbon Anode Density

Figure 5: Plot of Residuals versus Predicted Values of Carbon Anode Density

Tests for Normality
Test / Statistic / p Value
Shapiro-Wilk / W / 0.966554 / Pr < W / 0.7310
Kolmogorov-Smirnov / D / 0.110889 / Pr > D / >0.1500
Cramer-von Mises / W-Sq / 0.036759 / Pr > W-Sq / >0.2500
Anderson-Darling / A-Sq / 0.270143 / Pr > A-Sq / >0.2500

Table 18: Normality Tests for Carbon Anode Density Residuals

> pbad2way(2, 3, carbon$x, est=mean, conall=T, alpha=.05, nboot=2000)

[1] "Taking bootstrap samples. Please wait."

[1] "Working on group 1"

[1] "Working on group 2"

[1] "Working on group 3"

[1] "Working on group 4"

[1] "Working on group 5"

[1] "Working on group 6"

$sig.levelA:

[1] 0

$sig.levelB:

[1] 0

$sig.levelAB:

[1] 0.3305

Table 19: Results of Two-Way ANOVA for Carbon Anode Density Data

References

Montgomery, D. C. (2005). Design and analysis of experiments (Sixth ed.).Hoboken, NJ: John Wiley & Sons, Inc.

Wilcox, R. R. (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.