Statistics 5372: Experimental Statistics

Exam I Key

Problem 1. Drive Time Analysis

Data Set/Problem Description

The problem is to determine the effects of route and time of day on drive time for trucks leaving a certain factory. Tree different departure times were considered (morning, afternoon, and evening) and 2 routes were selected. The experiment involved randomly selecting 10 trucks leaving the factory at each of the three time periods. Then for each time of day, 5 trucks were randomly selected to take route 1 while the other 5 were assigned to route 2. The data were analyzed as a 2-factor ANOVA model with route and time of day considered as fixed effects.

Key Results of the Analysis

The analysis of variance table (Table 1) indicates that there is a significant interaction between route and time of day (p .0001). Thus, we do not test for main effects, but instead we compare the cell means. The cell means as given by SAS are given in Table 2. LSD for comparing cell means at the  level of significance is

.

Interaction plots of the cell means are shown in Figure 1. In Table 3 the LSD multiple comparisons are given. From the interaction plots we see that route 2 is a particularly bad route to take in the morning, and that in general, morning drive time is longest. Multiple comparisons show that taking route 2 in the evening results in a significantly shorter drive time than either route in the morning and afternoon. Evening drive time with either route takes significantly less time than either the morning or the afternoon using route 1. Similarly, the morning drive time using route 2 is the worst time/route combination while morning drive time using route 1 takes is significantly longer than route 2 in the afternoon and both evening drive times. Route does not make a significant difference in either the afternoon or evening.

In Figure 2 we present side-by-side boxplots of the data as well as a probability plot and histogram of the residuals. The box plots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality

Conclusionsin the Language of the Problem

The morning drive time is clearly the longest. If driving must be done in the morning, then route 1 should be taken. The shortest drive times are in the evening. Route makes the most difference in the morning, in which case, as mentioned, route 1 is preferred. On the other hand, route does not make a significant difference in either the afternoon or evening.

Appendices:

Appendix A1. Tables and Figures Cited in the Report

2-Factor ANOVA - Drive-Time Data

The GLM Procedure

Dependent Variable: drive

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 5 61776.56667 12355.31333 33.06 <.0001

Error 24 8968.80000 373.70000

Corrected Total 29 70745.36667

R-Square Coeff Var Root MSE drive Mean

0.873224 3.956742 19.33132 488.5667

Source DF Type III SS Mean Square F Value Pr > F

route 1 864.03333 864.03333 2.31 0.1414

tofday 2 49992.26667 24996.13333 66.89 <.0001

route*tofday 2 10920.26667 5460.13333 14.61 <.0001

Table 2. Cell Means for Drive Time Data

------route=1 tofday=1 ------

Analysis Variable : drive

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

511.0000000 26.4669605

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------route=1 tofday=2 ------

Analysis Variable : drive

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

486.6000000 21.8471966

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------route=1 tofday=3 ------

Analysis Variable : drive

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

452.0000000 13.5462172

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------route=2 tofday=1 ------

Analysis Variable : drive

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

575.6000000 11.9916638

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------route=2 tofday=2 ------

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

467.6000000 11.5887877

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------route=2 tofday=3 ------

Analysis Variable : drive

Mean Std Dev

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

438.6000000 24.5519857

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Table 3. Calculations for LSD comparisons for Problem 1 -- Drive Time Cell Means

Code: M=morning, A=afternoon, E=evening, 1=Route1 2=Route2

M2 M1 A1 A2 E1 E2

575.6511.0486.6467.6452.0438.6

Comparison Actual Difference (lsd = 25.34)

M2 vs E2 137.0

M2 vs E1 123.6

M2 vs A2 108.0

M2 vs A1 89.0

M2 vs M2 64.6

M1 vs E2 72.4

M1 vs E1 59.0

M1 vs A2 43.4

M1 vs A1 24.4 x

A1 vs E2 48.0

A1 vs E1 34.6

A1 vs A2 19.0 x

A2 vs E2 29.0

A2 vs E1 15.6 x

E1 vs E2 13.4 x

M2 M1 A1 A2 E1 E2

------

------

------

------

Figure 1. Interaction Plots for Problem 1 - Drive Time Data

Figure 2. Diagnostic Plots for Problem 1 – Drive Time Data

Appendix B1. SAS Code for Problem 1 – Drive Time Data

DATA one;

INPUT route tofday drive combined$;

datalines;

1 1 490 1M

1 1 553 1M

1 1 489 1M

1 1 504 1M

1 1 519 1M

1 2 511 1A

1 2 490 1A

1 2 489 1A

1 2 492 1A

1 2 451 1A

1 3 435 1E

1 3 468 1E

1 3 463 1E

1 3 450 1E

1 3 444 1E

2 1 585 2M

2 1 589 2M

2 1 575 2M

2 1 570 2M

2 1 559 2M

2 2 456 2A

2 2 460 2A

2 2 464 2A

2 2 485 2A

2 2 473 2A

2 3 406 2E

2 3 422 2E

2 3 459 2E

2 3 442 2E

2 3 464 2E

;

PROCGLM;

CLASS route tofday;

MODEL drive=route tofday route*tofday;

outputout=new r=resid;

means route tofday/lsd;

TITLE'2-Factor ANOVA - Drive-Time Data';

run;

PROCSORTdata=one;BY route tofday;

PROCMEANSmeanstddata=one;BY route tofday; OUTPUTOUT=cells MEAN=drive;

Title'Cell Means for Drive Time Data';

RUN;

PROCGPLOTdata=cells;

TITLE'Plot of Means';

PLOT drive*tofday=route;

Title'Interaction Plot - Drive Time Data';

SYMBOL1V=CIRCLE I=JOIN C=BLACK;

SYMBOL2V=DOT I=JOIN C=BLACK;

RUN;

PROCGPLOTdata=cells;

PLOT drive*route=tofday;

SYMBOL1V=M I=JOIN C=BLACK;

SYMBOL2V=A I=JOIN C=BLACK;

SYMBOL3V=E I=JOIN C=BLACK;

RUN;

procunivariatedata=new normalplot;

var resid;

title'Normal Probability Plot for Residuals - Drive Time Data';

run;

procgchartdata=new;

vbar resid;

run;

procboxplotdata=one;

plot drive*combined;

title'Boxplots for Drive Time Data';

run;

PROCSORTdata=one;BY route tofday;

procunivariatedata=one plot;

var drive; by route tofday;

run;

*------+

| Generated: Wednesday, March 9, 2005 16:18:40 |

| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |

+------*;

title;

footnote;

*** Probability Plots ***;

goptionsftext=SWISS ctext=BLACK htext=1 cells;

symbolv=SQUARE c=BLUE h=1 cells;

procunivariatedata=Work.New noprint;

var RESID;

probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1

hminor=0vminor=0 name='PROB'

normal( mu=est sigma=est color=BLUE l=1

w=1);

insetnormal;

run;

symbol;

goptionsftext= ctext= htext=;

Problem 2. Comparison of Training Programs

Data Set/Problem Description

In this experiment, four training programs: A, B, C, and D are compared. Twenty employees were randomly assigned to the four programs. After completion of the courses, each person was required to assemble a piece of equipment, and the time to complete the assembly was recorded. The data are shown in the accompanying SAS code and were analyzed as a 1-factor ANOVA model where training program is a fixed effect.

Key Results of the Analysis

The analysis of variance table (Table 1) indicates that we should reject the null hypothesis that the training programs are equally effective (p = .004). Thus, in order to examine the comparisons among programs, we use LSD to perform multiple comparisons. LSD for comparing means at the  level of significance is

.

This value for LSD is given in the SAS output in Table 1 along with the results of the comparisons. There it can be seen that people in training program C took significantly longer than those in either programs A or B. The program B mean is not significantly different from those of any of the other programs.

In Figure 1 we present side-by-side boxplots of the data as well as a probability plot and histogram of the residuals. The box plots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality

Conclusions in the Language of the Problem

On the basis of this study it is seen that programs A and B are preferable to program C all other factors, i.e. cost of program, etc. being constant.

Appendices:

A2. Tables and Figures Cited in the Report

Table 1. SAS GLM Output for Problem 2 – Training Program Comparison

The GLM Procedure

Dependent Variable: time

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 151.0000000 50.3333333 6.64 0.0040

Error 16 121.2000000 7.5750000

Corrected Total 19 272.2000000

R-Square Coeff Var Root MSE time Mean

0.554739 4.347981 2.752272 63.30000

Source DF Type III SS Mean Square F Value Pr > F

program 3 151.0000000 50.3333333 6.64 0.0040

t Tests (LSD) for time

NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise
error rate.

Alpha 0.05

Error Degrees of Freedom 16

Error Mean Square 7.575

Critical Value of t 2.11991

Least Significant Difference 3.6901

Means with the same letter are not significantly different.

t Grouping Mean N program

A 67.400 5 C

A

B A 64.200 5 D

B

B 61.000 5 A

B

B 60.600 5 B

Figure 1. Diagnostic Plots for Problem 2 – Training Program Data

Appendix B2. SAS Code for Problem 2 – Training Program Data

data training;

input program$ time;

datalines;

A 60

A 65

A 57

A 62

A 61

B 57

B 61

B 63

B 58

B 64

C 64

C 65

C 70

C 68

C 70

D 64

D 64

D 61

D 65

D 67

;

procglm;

class program;

model time=program;

title'ANOVA --- Training Program - Ex.1, Prob I';

outputout=new r=restrain;

means program/lsd;

run;

PROCmeans mean var n;

class program;

run;

procsort;

by program;

run;

PROCBOXPLOT;

plot time*program;

title'Side-by-Side Box Plots for Training Data';

run;

procunivariate;

var restime;

run;

procgchart;

title'Histogram for Residuals from Training Data';

vbar restrain;

run;

procunivariatenormalplot;

var restrain;

title'Normal Probability Plot for Residuals - Training Data';

run;

*** Probability Plots ***;

goptionsftext=SWISS ctext=BLACK htext=1 cells;

symbolv=SQUARE c=BLUE h=1 cells;

procunivariatedata=Work.New noprint;

var REStime;

probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1

hminor=0vminor=0 name='PROB'

normal( mu=est sigma=est color=BLUE l=1

w=1);

insetnormal;

run;

symbol;

goptionsftext= ctext= htext=;

procunivariatedata=training plot ;

var time;by program;

run;

probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1

hminor=0vminor=0 name='PROB'

normal( mu=est sigma=est color=BLUE l=1

w=1);

insetnormal;

run;

Problem 3. Athletic Shoe Marketing Study

Data Set/Problem Description

This study involved the effect of advertising on sales of a type of athletic shoe. The study was conducted in a randomly selected set of 3 standard metropolitan areas (SMSA’s) within the US and in three randomly selected department stores. Four weeks of sales were randomly selected for analysis within each SMSA/Chain Store combination. The data are given below. The data are analyzed using a 2-factor ANOVA with area (SMSA) and store both being random effects.

Key Results of the Analysis

The analysis of variance table (Table 1) is based on the fact that SMSA and store are random effects. Thus, we do not use the F tests in the body of the table, but instead we use those given from the “/test” option. There it can be seen that the variance component due to the interaction between SMSA and store is highly significant (p < .0001) while the variance components due to SMSA (p = .4583) and store (p = .2778) are not significant. In order to visualize the interaction, in Figure 1 we show the interaction plot between SMSA and store. Although we are not interested in the specific interactions shown among these stores and SMSA’s, the plots show that there is interaction among stores and SMSA’s. That is, certain stores do not always have better sales, regardless of the SMSA in which they are located. For example, in the data collected, store 2 was particularly strong in SMSA 3 but had similar sales to the other stores in the remaining SMSA’s. The variance component due to the SMSA by store interaction was

.

In Figure 2the side-by-side boxplots of the data are presented as well as a probability plot and histogram of the residuals. The boxplots show a fairly consistent variation and suggest that equality of variances is a reasonable assumption. The probability plot and histogram provide no evidence to reject normality

Conclusions in the Language of the Problem

The only significant variance component was due to store by area interaction. This suggests that relative strength of sales for particular stores tends to vary by area, i.e. some stores tend to be stronger in one area of the country while others will show relative strength in another area.
Appendices:

A3. Tables and Figures Cited in the Report

Table 1. SAS GLM Output for Problem 3 – Athletic Shoe Marketing Study

2-Factor Random Effects ANOVA - Sales

The GLM Procedure

Dependent Variable: sales

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 8 85812.38889 10726.54861 43.89 <.0001

Error 27 6598.50000 244.38889

Corrected Total 35 92410.88889

R-Square Coeff Var Root MSE sales Mean

0.928596 10.39117 15.63294 150.4444

Source DF Type III SS Mean Square F Value Pr > F

smsa 2 17246.72222 8623.36111 35.29 <.0001

store 2 32425.72222 16212.86111 66.34 <.0001

smsa*store 4 36139.94444 9034.98611 36.97 <.0001

Source Type III Expected Mean Square

smsa Var(Error) + 4 Var(smsa*store) + 12 Var(smsa)

store Var(Error) + 4 Var(smsa*store) + 12 Var(store)

smsa*store Var(Error) + 4 Var(smsa*store)

Tests of Hypotheses for Random Model Analysis of Variance

Dependent Variable: sales

Source DF Type III SS Mean Square F Value Pr > F

smsa 2 17247 8623.361111 0.95 0.4583

store 2 32426 16213 1.79 0.2778

Error 4 36140 9034.986111

Error: MS(smsa*store)

Source DF Type III SS Mean Square F Value Pr > F

smsa*store 4 36140 9034.986111 36.97 <.0001

Error: MS(Error) 27 6598.500000 244.388889

Figure 1. Interaction Plots for Problem 3 - Marketing Data

Figure 1. Diagnostic Plots for Problem 3 – Marketing Data

Appendix B3. SAS Code for Problem 3 –Athletic Shoe Marketing Study

DATA one;

INPUT smsa store sales combined$;

datalines;

1 1 129 11

1 1 111 11

1 1 128 11

1 1 148 11

1 2 187 11

1 2 171 12

1 2 145 12

1 2 166 12

1 3 150 13

1 3 102 13

1 3 127 13

1 3 125 13

2 1 138 21

2 1 152 21

2 1 125 21

2 1 135 21

2 2 117 22

2 2 130 22

2 2 142 22

2 2 119 22

2 3 110 23

2 3 125 23

2 3 142 23

2 3 123 23

3 1 150 31

3 1 128 31

3 1 115 31

3 1 128 31

3 2 280 32

3 2 300 32

3 2 295 32

3 2 261 32

3 3 150 33

3 3 108 33

3 3 119 33

3 3 135 33

;

PROCGLM;

CLASS smsa store;

MODEL sales= smsa store smsa*store;

outputout=new r=ressales;

random smsa store smsa*store/test;

TITLE'2-Factor Random Effects ANOVA - Marketing Data';

run;

PROCSORTdata=one;BY smsa store;

PROCMEANSmeanstddata=one;BY smsa store; OUTPUTOUT=cells MEAN=sales;

Title'Cell Means for Sales Data';

RUN;

PROCGPLOTdata=cells;

TITLE'Plot of Means';

PLOT sales*smsa=store;

Title'Interaction Plot - Marketing Data';

SYMBOL1V=1I=JOIN C=BLACK;

SYMBOL2V=2I=JOIN C=BLACK;

SYMBOL3V=3I=JOIN C=BLACK;

RUN;

PROCGPLOTdata=cells;

PLOT sales*store=smsa;

SYMBOL1V=1I=JOIN C=BLACK;

SYMBOL2V=2I=JOIN C=BLACK;

SYMBOL3V=3I=JOIN C=BLACK;

RUN;

procunivariatedata=new normalplot;

var ressales;

title'Normal Probability Plot for Residuals - Marketing Data';

run;

procgchartdata=new;

title'Histogram for Residuals - Marketing Data';

vbar ressales;

run;

procboxplotdata=one;

plot sales*combined;

title'Boxplots for Marketing Data';

run;

PROCSORTdata=one;BY route tofday;

procunivariatedata=one plot;

var drive; by route tofday;

run;

*------+

| Generated: Wednesday, March 9, 2005 16:18:40 |

| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |

+------*;

title;

footnote;

*** Probability Plots ***;

goptionsftext=SWISS ctext=BLACK htext=1 cells;

symbolv=SQUARE c=BLUE h=1 cells;

procunivariatedata=Work.New noprint;

var RESID;

probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1

hminor=0vminor=0 name='PROB'

normal( mu=est sigma=est color=BLUE l=1

w=1);

insetnormal;

run;

symbol;

goptionsftext= ctext= htext=;

*------+

| Generated: Wednesday, March 9, 2005 17:27:23 |

| Data: C:\DOCUME~1\00013961\LOCALS~1\Temp\SAS Temporary Files\_TD2700\New |

+------*;

title;

footnote;

*** Probability Plots ***;

goptionsftext=SWISS ctext=BLACK htext=1 cells;

symbolv=SQUARE c=BLUE h=1 cells;

procunivariatedata=Work.New noprint;

var RESSALES;

probplot / caxes=BLACK cframe=CXF7E1C2 waxis= 1

hminor=0vminor=0 name='PROB'

normal( mu=est sigma=est color=BLUE l=1

w=1);

insetnormal;

run;

symbol;

goptionsftext= ctext= htext=;