T Tests, ANOVA, and Regression Analysis

Here is a one-sample t test of the null hypothesis that mu = 0:

DATA ONESAMPLE; INPUT Y @@;

CARDS;

1 2 3 4 5 6 7 8 9 10

PROC MEANS T PRT; RUN;

------

The SAS System

The MEANS Procedure

Analysis Variable : Y

t Value Pr > |t|

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

5.74 0.0003

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

------

Now an ANOVA on the same data but with no grouping variable:

PROC ANOVA; MODEL Y = ; run;

------

The SAS System

The ANOVA Procedure

Dependent Variable: Y

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 1 302.5000000 302.5000000 33.00 0.0003

Error 9 82.5000000 9.1666667

Uncorrected Total 10 385.0000000

R-Square Coeff Var Root MSE Y Mean

0.000000 55.04819 3.027650 5.500000

Source DF Anova SS Mean Square F Value Pr > F

Intercept 1 302.5000000 302.5000000 33.00 0.0003

------

Notice that the ANOVA F is simply the square of the one-sample t, and the one-tailed p from the ANOVA is identical to the two-tailed p from the t.

Now an Regression analysis with Model Y = intercept + error.

PROC REG; MODEL Y = ; run;

------

The REG Procedure

Model: MODEL1

Dependent Variable: Y

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 0 0 . . .

Error 9 82.50000 9.16667

Corrected Total 9 82.50000

Root MSE 3.02765 R-Square 0.0000

Dependent Mean 5.50000 Adj R-Sq 0.0000

Coeff Var 55.04819

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 5.50000 0.95743 5.74 0.0003

------

Notice that the ANOVA is replicated.

Now consider a two independent groups t test with pooled variances, null is
mu1-mu2 = 0:

DATA TWOSAMPLE; INPUT X Y @@;

CARDS;

1 1 1 2 1 3 1 4 1 5

2 6 2 7 2 8 2 9 2 10

PROC TTEST; CLASS X; VAR Y; RUN;

------

The SAS System

T-Tests

Variable Method Variances DF t Value Pr > |t|

Y Pooled Equal 8 -5.00 0.0011

------

Now an ANOVA on the same data:

PROC ANOVA; CLASS X; MODEL Y = X; RUN;

------

The ANOVA Procedure

Dependent Variable: Y

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 1 62.50000000 62.50000000 25.00 0.0011

Error 8 20.00000000 2.50000000

Corrected Total 9 82.50000000

R-Square Coeff Var Root MSE Y Mean

0.757576 28.74798 1.581139 5.500000

Source DF Anova SS Mean Square F Value Pr > F

X 1 62.50000000 62.50000000 25.00 0.0011

------

Notice that the ANOVA F is simply the square of the independent samples t and the one-tailed ANOVA p identical to the two-tailed p from t.

And finally replication of the ANOVA with a regression analysis:

PROC REG; MODEL Y = X; run;

------

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: Y

Number of Observations Read 10

Number of Observations Used 10

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 62.50000 62.50000 25.00 0.0011

Error 8 20.00000 2.50000

Corrected Total 9 82.50000

Root MSE 1.58114 R-Square 0.7576

Dependent Mean 5.50000 Adj R-Sq 0.7273

Coeff Var 28.74798

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -2.00000 1.58114 -1.26 0.2415

X 1 5.00000 1.00000 5.00 0.0011

OK, but what if we have more than two groups? Show me that the ANOVA is a regression analysis in that case.

Here is the SAS program, with data:

data Lotus;

input Dose N; Do I=1 to N; Input Illness @@; output; end;

cards;

0 20

101 101 101 104 104 105 110 111 111 113 114 79 89 91 94 95 96 99 99 99

10 20

100 65 65 67 68 80 81 82 85 87 87 88 88 91 92 94 95 94 96 96

20 20

64 75 75 76 77 79 79 80 80 81 81 81 82 83 83 85 87 88 90 96

30 20

100 105 108 80 82 85 87 87 87 89 90 90 92 92 92 95 95 97 98 99

40 20

101 102 102 105 108 109 112 119 119 123 82 89 92 94 94 95 95 97 98 99

*****************************************************************************;

proc GLM data=Lotus; class Dose;

model Illness = Dose / ss1;

title 'Here we have a traditional one-way independent samples ANOVA'; run;

*****************************************************************************;

data Polynomial; set Lotus; Quadratic=Dose*Dose; Cubic=Dose**3; Quartic=Dose**4;

proc GLM data=Polynomial; model Illness = Dose Quadratic Cubic Quartic / ss1;

title 'Here we have a polynomial regression analysis.'; run;

*****************************************************************************

Here is the output:

Here we have a traditional one-way independent samples ANOVA 2

The GLM Procedure

Dependent Variable: Illness

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 4 6791.54000 1697.88500 20.78 <.0001

Error 95 7762.70000 81.71263

Corrected Total 99 14554.24000

R-Square Coeff Var Root MSE Illness Mean

0.466637 9.799983 9.039504 92.24000

Source DF Type I SS Mean Square F Value Pr > F

Dose 4 6791.540000 1697.885000 20.78 <.0001

------

Here we have a polynomial regression analysis. 3

The GLM Procedure

Number of observations 100

------

Here we have a polynomial regression analysis. 4

The GLM Procedure

Dependent Variable: Illness

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 4 6791.54000 1697.88500 20.78 <.0001

Error 95 7762.70000 81.71263

Corrected Total 99 14554.24000

Note that the polynomial regression produced exactly the same F, p, SS, MS, as the traditional ANOVA.

R-Square Coeff Var Root MSE Illness Mean

0.466637 9.799983 9.039504 92.24000

Source DF Type I SS Mean Square F Value Pr > F

Dose 1 174.845000 174.845000 2.14 0.1468

Quadratic 1 6100.889286 6100.889286 74.66 <.0001

Cubic 1 389.205000 389.205000 4.76 0.0315

Quartic 1 126.600714 126.600714 1.55 0.2163

------

Return to Wuensch’s Stats Lessons Page

November, 2006