Statistics 5372: Experimental Statistics
Exam I Key
Problem 1. Drive Time Analysis
Data Set/Problem Description
The problem is to determine the effects of route and time of day on drive time for trucks leaving a certain factory. Tree different departure times were considered (morning, afternoon, and evening) and 2 routes were selected. The experiment involved randomly selecting 10 trucks leaving the factory at each of the three time periods. Then for each time of day, 5 trucks were randomly selected to take route 1 while the other 5 were assigned to route 2. The data were analyzed as a 2-factor ANOVA model with route and time of day considered as fixed effects.
Key Results of the Analysis
The analysis of variance table (Table 1) indicates that there is a significant interaction between route and time of day (p .0001). Thus, we do not test for main effects, but instead we compare the cell means. The cell means as given by SAS are given in Table 2. LSD for comparing cell means at the level of significance is
.
Interaction plots of the cell means are shown in Figure 1. In Table 3 the LSD multiple comparisons are given. From the interaction plots we see that route 2 is a particularly bad route to take in the morning, and that in general, morning drive time is longest. Multiple comparisons show that taking route 2 in the evening results in a significantly shorter drive time than either route in the morning and afternoon. Evening drive time with either route takes significantly less time than either the morning or the afternoon using route 1. Similarly, the morning drive time using route 2 is the worst time/route combination while morning drive time using route 1 takes is significantly longer than route 2 in the afternoon and both evening drive times. Route does not make a significant difference in either the afternoon or evening.
In Figure 2 we present side-by-side boxplots of the data as well as a probability plot and histogram of the residuals. The box plots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality
Conclusionsin the Language of the Problem
The morning drive time is clearly the longest. If driving must be done in the morning, then route 1 should be taken. The shortest drive times are in the evening. Route makes the most difference in the morning, in which case, as mentioned, route 1 is preferred. On the other hand, route does not make a significant difference in either the afternoon or evening.
Appendices:
Appendix A1. Tables and Figures Cited in the Report
2-Factor ANOVA - Drive-Time Data
The GLM Procedure
Dependent Variable: drive
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 5 61776.56667 12355.31333 33.06 <.0001
Error 24 8968.80000 373.70000
Corrected Total 29 70745.36667
R-Square Coeff Var Root MSE drive Mean
0.873224 3.956742 19.33132 488.5667
Source DF Type III SS Mean Square F Value Pr > F
route 1 864.03333 864.03333 2.31 0.1414
tofday 2 49992.26667 24996.13333 66.89 <.0001
route*tofday 2 10920.26667 5460.13333 14.61 <.0001
Table 2. Cell Means for Drive Time Data
------route=1 tofday=1 ------
Analysis Variable : drive
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
511.0000000 26.4669605
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------route=1 tofday=2 ------
Analysis Variable : drive
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
486.6000000 21.8471966
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------route=1 tofday=3 ------
Analysis Variable : drive
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
452.0000000 13.5462172
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------route=2 tofday=1 ------
Analysis Variable : drive
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
575.6000000 11.9916638
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------route=2 tofday=2 ------
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
467.6000000 11.5887877
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------route=2 tofday=3 ------
Analysis Variable : drive
Mean Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
438.6000000 24.5519857
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Table 3. Calculations for LSD comparisons for Problem 1 -- Drive Time Cell Means
Code: M=morning, A=afternoon, E=evening, 1=Route1 2=Route2
M2 M1 A1 A2 E1 E2
575.6511.0486.6467.6452.0438.6
Comparison Actual Difference (lsd = 25.34)
M2 vs E2 137.0
M2 vs E1 123.6
M2 vs A2 108.0
M2 vs A1 89.0
M2 vs M2 64.6
M1 vs E2 72.4
M1 vs E1 59.0
M1 vs A2 43.4
M1 vs A1 24.4 x
A1 vs E2 48.0
A1 vs E1 34.6
A1 vs A2 19.0 x
A2 vs E2 29.0
A2 vs E1 15.6 x
E1 vs E2 13.4 x
M2 M1 A1 A2 E1 E2
------
------
------
------
Figure 1. Interaction Plots for Problem 1 - Drive Time Data
Figure 2. Diagnostic Plots for Problem 1 – Drive Time Data
Appendix B1. SAS Code for Problem 1 – Drive Time Data
DATA one;
INPUT route tofday drive combined$;
datalines;
1 1 490 1M
1 1 553 1M
1 1 489 1M
1 1 504 1M
1 1 519 1M
1 2 511 1A
1 2 490 1A
1 2 489 1A
1 2 492 1A
1 2 451 1A
1 3 435 1E
1 3 468 1E
1 3 463 1E
1 3 450 1E
1 3 444 1E
2 1 585 2M
2 1 589 2M
2 1 575 2M
2 1 570 2M
2 1 559 2M
2 2 456 2A
2 2 460 2A
2 2 464 2A
2 2 485 2A
2 2 473 2A
2 3 406 2E
2 3 422 2E
2 3 459 2E
2 3 442 2E
2 3 464 2E
;
PROCGLM;
CLASS route tofday;
MODEL drive=route tofday route*tofday;
outputout=new r=resid;
means route tofday/lsd;
TITLE'2-Factor ANOVA - Drive-Time Data';
run;
PROCSORTdata=one;BY route tofday;
PROCMEANSmeanstddata=one;BY route tofday; OUTPUTOUT=cells MEAN=drive;
Title'Cell Means for Drive Time Data';
RUN;
PROCGPLOTdata=cells;
TITLE'Plot of Means';
PLOT drive*tofday=route;
Title'Interaction Plot - Drive Time Data';
SYMBOL1V=CIRCLE I=JOIN C=BLACK;
SYMBOL2V=DOT I=JOIN C=BLACK;
RUN;
PROCGPLOTdata=cells;
PLOT drive*route=tofday;
SYMBOL1V=M I=JOIN C=BLACK;
SYMBOL2V=A I=JOIN C=BLACK;
SYMBOL3V=E I=JOIN C=BLACK;
RUN;
procunivariatedata=new normalplot;
var resid;
title'Normal Probability Plot for Residuals - Drive Time Data';
run;
procgchartdata=new;
vbar resid;
run;
procboxplotdata=one;
plot drive*combined;
title'Boxplots for Drive Time Data';
run;
PROCSORTdata=one;BY route tofday;
procunivariatedata=one plot;
var drive; by route tofday;
run;
*------+
| Generated: Wednesday, March 9, 2005 16:18:40 |
| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |
+------*;
title;
footnote;
*** Probability Plots ***;
goptionsftext=SWISS ctext=BLACK htext=1 cells;
symbolv=SQUARE c=BLUE h=1 cells;
procunivariatedata=Work.New noprint;
var RESID;
probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1
hminor=0vminor=0 name='PROB'
normal( mu=est sigma=est color=BLUE l=1
w=1);
insetnormal;
run;
symbol;
goptionsftext= ctext= htext=;
Problem 2. Comparison of Training Programs
Data Set/Problem Description
In this experiment, four training programs: A, B, C, and D are compared. Twenty employees were randomly assigned to the four programs. After completion of the courses, each person was required to assemble a piece of equipment, and the time to complete the assembly was recorded. The data are shown in the accompanying SAS code and were analyzed as a 1-factor ANOVA model where training program is a fixed effect.
Key Results of the Analysis
The analysis of variance table (Table 1) indicates that we should reject the null hypothesis that the training programs are equally effective (p = .004). Thus, in order to examine the comparisons among programs, we use LSD to perform multiple comparisons. LSD for comparing means at the level of significance is
.
This value for LSD is given in the SAS output in Table 1 along with the results of the comparisons. There it can be seen that people in training program C took significantly longer than those in either programs A or B. The program B mean is not significantly different from those of any of the other programs.
In Figure 1 we present side-by-side boxplots of the data as well as a probability plot and histogram of the residuals. The box plots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality
Conclusions in the Language of the Problem
On the basis of this study it is seen that programs A and B are preferable to program C all other factors, i.e. cost of program, etc. being constant.
Appendices:
A2. Tables and Figures Cited in the Report
Table 1. SAS GLM Output for Problem 2 – Training Program Comparison
The GLM Procedure
Dependent Variable: time
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 151.0000000 50.3333333 6.64 0.0040
Error 16 121.2000000 7.5750000
Corrected Total 19 272.2000000
R-Square Coeff Var Root MSE time Mean
0.554739 4.347981 2.752272 63.30000
Source DF Type III SS Mean Square F Value Pr > F
program 3 151.0000000 50.3333333 6.64 0.0040
t Tests (LSD) for time
NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise
error rate.
Alpha 0.05
Error Degrees of Freedom 16
Error Mean Square 7.575
Critical Value of t 2.11991
Least Significant Difference 3.6901
Means with the same letter are not significantly different.
t Grouping Mean N program
A 67.400 5 C
A
B A 64.200 5 D
B
B 61.000 5 A
B
B 60.600 5 B
Figure 1. Diagnostic Plots for Problem 2 – Training Program Data
Appendix B2. SAS Code for Problem 2 – Training Program Data
data training;
input program$ time;
datalines;
A 60
A 65
A 57
A 62
A 61
B 57
B 61
B 63
B 58
B 64
C 64
C 65
C 70
C 68
C 70
D 64
D 64
D 61
D 65
D 67
;
procglm;
class program;
model time=program;
title'ANOVA --- Training Program - Ex.1, Prob I';
outputout=new r=restrain;
means program/lsd;
run;
PROCmeans mean var n;
class program;
run;
procsort;
by program;
run;
PROCBOXPLOT;
plot time*program;
title'Side-by-Side Box Plots for Training Data';
run;
procunivariate;
var restime;
run;
procgchart;
title'Histogram for Residuals from Training Data';
vbar restrain;
run;
procunivariatenormalplot;
var restrain;
title'Normal Probability Plot for Residuals - Training Data';
run;
*** Probability Plots ***;
goptionsftext=SWISS ctext=BLACK htext=1 cells;
symbolv=SQUARE c=BLUE h=1 cells;
procunivariatedata=Work.New noprint;
var REStime;
probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1
hminor=0vminor=0 name='PROB'
normal( mu=est sigma=est color=BLUE l=1
w=1);
insetnormal;
run;
symbol;
goptionsftext= ctext= htext=;
procunivariatedata=training plot ;
var time;by program;
run;
probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1
hminor=0vminor=0 name='PROB'
normal( mu=est sigma=est color=BLUE l=1
w=1);
insetnormal;
run;
Problem 3. Athletic Shoe Marketing Study
Data Set/Problem Description
This study involved the effect of advertising on sales of a type of athletic shoe. The study was conducted in a randomly selected set of 3 standard metropolitan areas (SMSA’s) within the US and in three randomly selected department stores. Four weeks of sales were randomly selected for analysis within each SMSA/Chain Store combination. The data are given below. The data are analyzed using a 2-factor ANOVA with area (SMSA) and store both being random effects.
Key Results of the Analysis
The analysis of variance table (Table 1) is based on the fact that SMSA and store are random effects. Thus, we do not use the F tests in the body of the table, but instead we use those given from the “/test” option. There it can be seen that the variance component due to the interaction between SMSA and store is highly significant (p < .0001) while the variance components due to SMSA (p = .4583) and store (p = .2778) are not significant. In order to visualize the interaction, in Figure 1 we show the interaction plot between SMSA and store. Although we are not interested in the specific interactions shown among these stores and SMSA’s, the plots show that there is interaction among stores and SMSA’s. That is, certain stores do not always have better sales, regardless of the SMSA in which they are located. For example, in the data collected, store 2 was particularly strong in SMSA 3 but had similar sales to the other stores in the remaining SMSA’s. The variance component due to the SMSA by store interaction was
.
In Figure 2the side-by-side boxplots of the data are presented as well as a probability plot and histogram of the residuals. The boxplots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality
Conclusions in the Language of the Problem
The only significant variance component was due to store by area interaction. This suggests that relative strength of sales for particular stores tends to vary by area, i.e. some stores tend to be stronger in one area of the country while others will show relative strength in another area.
Appendices:
A3. Tables and Figures Cited in the Report
Table 1. SAS GLM Output for Problem 3 – Athletic Shoe Marketing Study
2-Factor Random Effects ANOVA - Sales
The GLM Procedure
Dependent Variable: sales
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 8 85812.38889 10726.54861 43.89 <.0001
Error 27 6598.50000 244.38889
Corrected Total 35 92410.88889
R-Square Coeff Var Root MSE sales Mean
0.928596 10.39117 15.63294 150.4444
Source DF Type III SS Mean Square F Value Pr > F
smsa 2 17246.72222 8623.36111 35.29 <.0001
store 2 32425.72222 16212.86111 66.34 <.0001
smsa*store 4 36139.94444 9034.98611 36.97 <.0001
Source Type III Expected Mean Square
smsa Var(Error) + 4 Var(smsa*store) + 12 Var(smsa)
store Var(Error) + 4 Var(smsa*store) + 12 Var(store)
smsa*store Var(Error) + 4 Var(smsa*store)
Tests of Hypotheses for Random Model Analysis of Variance
Dependent Variable: sales
Source DF Type III SS Mean Square F Value Pr > F
smsa 2 17247 8623.361111 0.95 0.4583
store 2 32426 16213 1.79 0.2778
Error 4 36140 9034.986111
Error: MS(smsa*store)
Source DF Type III SS Mean Square F Value Pr > F
smsa*store 4 36140 9034.986111 36.97 <.0001
Error: MS(Error) 27 6598.500000 244.388889
Figure 1. Interaction Plots for Problem 3 - Marketing Data
Figure 1. Diagnostic Plots for Problem 3 – Marketing Data
Appendix B3. SAS Code for Problem 3 –Athletic Shoe Marketing Study
DATA one;
INPUT smsa store sales combined$;
datalines;
1 1 129 11
1 1 111 11
1 1 128 11
1 1 148 11
1 2 187 11
1 2 171 12
1 2 145 12
1 2 166 12
1 3 150 13
1 3 102 13
1 3 127 13
1 3 125 13
2 1 138 21
2 1 152 21
2 1 125 21
2 1 135 21
2 2 117 22
2 2 130 22
2 2 142 22
2 2 119 22
2 3 110 23
2 3 125 23
2 3 142 23
2 3 123 23
3 1 150 31
3 1 128 31
3 1 115 31
3 1 128 31
3 2 280 32
3 2 300 32
3 2 295 32
3 2 261 32
3 3 150 33
3 3 108 33
3 3 119 33
3 3 135 33
;
PROCGLM;
CLASS smsa store;
MODEL sales= smsa store smsa*store;
outputout=new r=ressales;
random smsa store smsa*store/test;
TITLE'2-Factor Random Effects ANOVA - Marketing Data';
run;
PROCSORTdata=one;BY smsa store;
PROCMEANSmeanstddata=one;BY smsa store; OUTPUTOUT=cells MEAN=sales;
Title'Cell Means for Sales Data';
RUN;
PROCGPLOTdata=cells;
TITLE'Plot of Means';
PLOT sales*smsa=store;
Title'Interaction Plot - Marketing Data';
SYMBOL1V=1I=JOIN C=BLACK;
SYMBOL2V=2I=JOIN C=BLACK;
SYMBOL3V=3I=JOIN C=BLACK;
RUN;
PROCGPLOTdata=cells;
PLOT sales*store=smsa;
SYMBOL1V=1I=JOIN C=BLACK;
SYMBOL2V=2I=JOIN C=BLACK;
SYMBOL3V=3I=JOIN C=BLACK;
RUN;
procunivariatedata=new normalplot;
var ressales;
title'Normal Probability Plot for Residuals - Marketing Data';
run;
procgchartdata=new;
title'Histogram for Residuals - Marketing Data';
vbar ressales;
run;
procboxplotdata=one;
plot sales*combined;
title'Boxplots for Marketing Data';
run;
PROCSORTdata=one;BY route tofday;
procunivariatedata=one plot;
var drive; by route tofday;
run;
*------+
| Generated: Wednesday, March 9, 2005 16:18:40 |
| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |
+------*;
title;
footnote;
*** Probability Plots ***;
goptionsftext=SWISS ctext=BLACK htext=1 cells;
symbolv=SQUARE c=BLUE h=1 cells;
procunivariatedata=Work.New noprint;
var RESID;
probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1
hminor=0vminor=0 name='PROB'
normal( mu=est sigma=est color=BLUE l=1
w=1);
insetnormal;
run;
symbol;
goptionsftext= ctext= htext=;
*------+
| Generated: Wednesday, March 9, 2005 17:27:23 |
| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |
+------*;
title;
footnote;
*** Probability Plots ***;
goptionsftext=SWISS ctext=BLACK htext=1 cells;
symbolv=SQUARE c=BLUE h=1 cells;
procunivariatedata=Work.New noprint;
var RESSALES;
probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1
hminor=0vminor=0 name='PROB'
normal( mu=est sigma=est color=BLUE l=1
w=1);
insetnormal;
run;
symbol;
goptionsftext= ctext= htext=;