Statistics 701

Using PROC ANOVA in SAS to Perform ANOVA and Multiple Comparisons

(Source of Notes: SAS Online Manual)

Comparing Group Means with PROC ANOVA and PROC GLM

When you have more than two means to compare, an F test in PROC ANOVA or PROC GLM tells you whether the means are significantly different from each other, but it does not tell you which means differ from which other means.

If you have specific comparisons in mind, you can use the CONTRAST statement in PROC GLM to make these comparisons. However, if you make many comparisons using some given significance level (0.05, for example), you are more likely to make a type 1 error (incorrectly rejecting a hypothesis that the means are equal) simply because you have more chances to make the error.

Multiple comparison methods give you more detailed information about the differences among the means and enables you to control error rates for a multitude of comparisons. A variety of multiple comparison methods are available with the MEANS statement in both the ANOVA and GLM procedures, as well as the LSMEANS statement in PROC GLM. These are described in detail in "Multiple Comparisons" in Chapter 30, "The GLM Procedure."

An example using PROC ANOVA Command

The following example studies the effect of bacteria on the nitrogen content of red clover plants. The treatment factor is bacteria strain, and it has six levels. Five of the six levels consist of five different Rhizobium trifolii bacteria cultures combined with a composite of five Rhizobium meliloti strains. The sixth level is a composite of the five Rhizobium trifolii strains with the composite of the Rhizobium meliloti. Red clover plants are inoculated with the treatments, and nitrogen content is later measured in milligrams. The data are derived from an experiment by Erdman (1946) and are analyzed in Chapters 7 and 8 of Steel and Torrie (1980). The following DATA step creates the SAS data set Clover:

title 'Nitrogen Content of Red Clover Plants';

data Clover;

input Strain $ Nitrogen @@;

datalines;

3DOK1 19.4 3DOK1 32.6 3DOK1 27.0 3DOK1 32.1 3DOK1 33.0

3DOK5 17.7 3DOK5 24.8 3DOK5 27.9 3DOK5 25.2 3DOK5 24.3

3DOK4 17.0 3DOK4 19.4 3DOK4 9.1 3DOK4 11.9 3DOK4 15.8

3DOK7 20.7 3DOK7 21.0 3DOK7 20.5 3DOK7 18.8 3DOK7 18.6

3DOK13 14.3 3DOK13 14.4 3DOK13 11.8 3DOK13 11.6 3DOK13 14.2

COMPOS 17.3 COMPOS 19.4 COMPOS 19.1 COMPOS 16.9 COMPOS 20.8

;

The variable Strain contains the treatment levels, and the variable Nitrogen contains the response. The following statements produce the analysis.

proc anova;

class Strain;

model Nitrogen = Strain;

run;

To perform multiple comparisons using the TUKEY procedure. The following command requests means of the Strain levels with Tukey's studentized range procedure, the Scheffe method, and the Bonferroni method.

means Strain / tukey scheffe bon;

run;

The output of these commands are as follows.

Nitrogen Content of Red Clover Plants 3

11:48 Monday, April 9, 2001

The ANOVA Procedure

Class Level Information

Class Levels Values

Strain 6 3DOK1 3DOK13 3DOK4 3DOK5 3DOK7 COMPOS

Number of observations 30

Nitrogen Content of Red Clover Plants

11:48 Monday, April 9, 2001

The ANOVA Procedure

Dependent Variable: Nitrogen

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 5 847.046667 169.409333 14.37 <.0001

Error 24 282.928000 11.788667

Corrected Total 29 1129.974667

R-Square Coeff Var Root MSE Nitrogen Mean

0.749616 17.26515 3.433463 19.88667

Source DF Anova SS Mean Square F Value Pr > F

Strain 5 847.0466667 169.4093333 14.37 <.0001

Nitrogen Content of Red Clover Plants

11:48 Monday, April 9, 2001


The ANOVA Procedure

Tukey's Studentized Range (HSD) Test for Nitrogen

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher

Type II error rate than REGWQ.

Alpha 0.05

Error Degrees of Freedom 24

Error Mean Square 11.78867

Critical Value of Studentized Range 4.37265

Minimum Significant Difference 6.7142

Means with the same letter are not significantly different.

Tukey Grouping Mean N Strain

A 28.820 5 3DOK1

A

B A 23.980 5 3DOK5

B

B C 19.920 5 3DOK7

B C

B C 18.700 5 COMPOS

C

C 14.640 5 3DOK4

C

C 13.260 5 3DOK13

Nitrogen Content of Red Clover Plants

11:48 Monday, April 9, 2001

The ANOVA Procedure

Bonferroni (Dunn) t Tests for Nitrogen

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher

Type II error rate than REGWQ.

Alpha 0.05

Error Degrees of Freedom 24

Error Mean Square 11.78867

Critical Value of t 3.25838

Minimum Significant Difference 7.0756

Means with the same letter are not significantly different.

Bon Grouping Mean N Strain

A 28.820 5 3DOK1

A

B A 23.980 5 3DOK5

B

B C 19.920 5 3DOK7

B C

B C 18.700 5 COMPOS

C

C 14.640 5 3DOK4

C

C 13.260 5 3DOK13


Nitrogen Content of Red Clover Plants

11:48 Monday, April 9, 2001

The ANOVA Procedure

Scheffe's Test for Nitrogen

NOTE: This test controls the Type I experimentwise error rate.

Alpha 0.05

Error Degrees of Freedom 24

Error Mean Square 11.78867

Critical Value of F 2.62065

Minimum Significant Difference 7.8605

Means with the same letter are not significantly different.

Scheffe Grouping Mean N Strain

A 28.820 5 3DOK1

A

B A 23.980 5 3DOK5

B

B C 19.920 5 3DOK7

B C

B C 18.700 5 COMPOS

C

C 14.640 5 3DOK4

C

C 13.260 5 3DOK13

Using PROC GLM in SAS for Analyzing Factorial Experiments

(Source of Notes: SAS Manual … Online)

Syntax

The following statements are available in PROC GLM.

PROC GLM < options > ;

CLASS variables ;

MODEL dependents=independents < / options > ;

CONTRAST 'label' effect values < ... effect values > < / options > ;

ESTIMATE 'label' effect values < ... effect values > < / options > ;

LSMEANS effects < / options > ;

MANOVA < test-options < / detail-options > ;

MEANS effects < / options > ;

OUTPUT < OUT=SAS-data-set >

keyword=names < ... keyword=names > < / option > ;

RANDOM effects < / options > ;

REPEATED factor-specification < / options > ;

TEST < H=effects > E=effect < / options > ;

Although there are numerous statements and options available in PROC GLM, many applications use only a few of them. Often you can find the features you need by looking at an example or by quickly scanning through this section.

To use PROC GLM, the PROC GLM and MODEL statements are required. You can specify only one MODEL statement (in contrast to the REG procedure, for example, which allows several MODEL statements in the same PROC REG run). If your model contains classification effects, the classification variables must be listed in a CLASS statement, and the CLASS statement must appear before the MODEL statement. In addition, if you use a CONTRAST statement in combination with a MANOVA, RANDOM, REPEATED, or TEST statement, the CONTRAST statement must be entered first in order for the contrast to be included in the MANOVA, RANDOM, REPEATED, or TEST analysis.

An Example: A Two-Factor Analysis with Interaction Effects

title 'Analysis of Unbalanced 2-by-2 Factorial';

data exp;

input A $ B $ Y;

datalines;

A1 B1 12

A1 B1 14

A1 B2 11

A1 B2 9

A2 B1 20

A2 B1 18

A2 B2 17

;

run;

proc glm;

class A B;

model Y=A B A*B;

run;


Output of this Run

Analysis of Unbalanced 2-by-2 Factorial 11:06 Monday, April 9, 2001

The GLM Procedure

Class Level Information

Class Levels Values

A 2 A1 A2

B 2 B1 B2

Number of observations 7

Analysis of Unbalanced 2-by-2 Factorial 2

11:06 Monday, April 9, 2001

The GLM Procedure

Dependent Variable: Y

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 91.71428571 30.57142857 15.29 0.0253

Error 3 6.00000000 2.00000000

Corrected Total 6 97.71428571

R-Square Coeff Var Root MSE Y Mean

0.938596 9.801480 1.414214 14.42857

Source DF Type I SS Mean Square F Value Pr > F

A 1 80.04761905 80.04761905 40.02 0.0080

B 1 11.26666667 11.26666667 5.63 0.0982

A*B 1 0.40000000 0.40000000 0.20 0.6850

Source DF Type III SS Mean Square F Value Pr > F

A 1 67.60000000 67.60000000 33.80 0.0101

B 1 10.00000000 10.00000000 5.00 0.1114

A*B 1 0.40000000 0.40000000 0.20 0.6850

Note: Type I SS are the sequential sum of squares, while Type III are the (usual) sum of squares where the effects of the other factors have been removed.


Performing Tests on Contrasts

CONTRAST Statement

CONTRAST 'label' effect values < ... effect values > < / options > ;

The CONTRAST statement enables you to perform custom hypothesis tests by specifying an L vector or matrix for testing the univariate hypothesis or the multivariate hypothesis L B M = 0. Thus, to use this feature you must be familiar with the details of the model parameterization that PROC GLM uses. For more information, see the "Parameterization of PROC GLM Models" section. All of the elements of the L vector may be given, or if only certain portions of the L vector are given, the remaining elements are constructed by PROC GLM from the context (in a manner similar to rule 4 discussed in the "Construction of Least-Squares Means" section). There is no limit to the number of CONTRAST statements you can specify, but they must appear after the MODEL statement. In addition, if you use a CONTRAST statement and a MANOVA, REPEATED, or TEST statement, appropriate tests for contrasts are carried out as part of the MANOVA, REPEATED, or TEST analysis. If you use a CONTRAST statement and a RANDOM statement, the expected mean square of the contrast is displayed. As a result of these additional analyses, the CONTRAST statement must appear before the MANOVA, REPEATED, RANDOM, or TEST statement.

In the CONTRAST statement,

label

identifies the contrast on the output. A label is required for every contrast specified. Labels must be enclosed in quotes.

effect

identifies an effect that appears in the MODEL statement, or the INTERCEPT effect. The INTERCEPT effect can be used when an intercept is fitted in the model. You do not need to include all effects that are in the MODEL statement.

values

are constants that are elements of the L vector associated with the effect.

Illustration of Using the Contrast Statement

We add two statements on the PROC GLM portion.

proc glm;

class A B;

model Y=A B A*B;

CONTRAST 'A1 versus A2' A 1 -1;

CONTRAST 'B1 versus B2' B 1 -1;

run;

The first contrast will compare the two levels of A; while the second contrast will compare the two levels of B.

The output of this run has the following additional lines, aside from those obtained above.

Contrast DF Contrast SS Mean Square F Value Pr > F

A1 versus A2 1 67.60000000 67.60000000 33.80 0.0101

B1 versus B2 1 10.00000000 10.00000000 5.00 0.1114