T Tests, ANOVA, and Regression Analysis
Here is a one-sample t test of the null hypothesis that mu = 0:
DATA ONESAMPLE; INPUT Y @@;
CARDS;
1 2 3 4 5 6 7 8 9 10
PROC MEANS T PRT; RUN;
------
The SAS System
The MEANS Procedure
Analysis Variable : Y
t Value Pr > |t|
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
5.74 0.0003
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
------
Now an ANOVA on the same data but with no grouping variable:
PROC ANOVA; MODEL Y = ; run;
------
The SAS System
The ANOVA Procedure
Dependent Variable: Y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 302.5000000 302.5000000 33.00 0.0003
Error 9 82.5000000 9.1666667
Uncorrected Total 10 385.0000000
R-Square Coeff Var Root MSE Y Mean
0.000000 55.04819 3.027650 5.500000
Source DF Anova SS Mean Square F Value Pr > F
Intercept 1 302.5000000 302.5000000 33.00 0.0003
------
Notice that the ANOVA F is simply the square of the one-sample t, and the one-tailed p from the ANOVA is identical to the two-tailed p from the t.
Now an Regression analysis with Model Y = intercept + error.
PROC REG; MODEL Y = ; run;
------
The REG Procedure
Model: MODEL1
Dependent Variable: Y
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 0 0 . . .
Error 9 82.50000 9.16667
Corrected Total 9 82.50000
Root MSE 3.02765 R-Square 0.0000
Dependent Mean 5.50000 Adj R-Sq 0.0000
Coeff Var 55.04819
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 5.50000 0.95743 5.74 0.0003
------
Notice that the ANOVA is replicated.
Now consider a two independent groups t test with pooled variances, null is
mu1-mu2 = 0:
DATA TWOSAMPLE; INPUT X Y @@;
CARDS;
1 1 1 2 1 3 1 4 1 5
2 6 2 7 2 8 2 9 2 10
PROC TTEST; CLASS X; VAR Y; RUN;
------
The SAS System
T-Tests
Variable Method Variances DF t Value Pr > |t|
Y Pooled Equal 8 -5.00 0.0011
------
Now an ANOVA on the same data:
PROC ANOVA; CLASS X; MODEL Y = X; RUN;
------
The ANOVA Procedure
Dependent Variable: Y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 62.50000000 62.50000000 25.00 0.0011
Error 8 20.00000000 2.50000000
Corrected Total 9 82.50000000
R-Square Coeff Var Root MSE Y Mean
0.757576 28.74798 1.581139 5.500000
Source DF Anova SS Mean Square F Value Pr > F
X 1 62.50000000 62.50000000 25.00 0.0011
------
Notice that the ANOVA F is simply the square of the independent samples t and the one-tailed ANOVA p identical to the two-tailed p from t.
And finally replication of the ANOVA with a regression analysis:
PROC REG; MODEL Y = X; run;
------
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: Y
Number of Observations Read 10
Number of Observations Used 10
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 62.50000 62.50000 25.00 0.0011
Error 8 20.00000 2.50000
Corrected Total 9 82.50000
Root MSE 1.58114 R-Square 0.7576
Dependent Mean 5.50000 Adj R-Sq 0.7273
Coeff Var 28.74798
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -2.00000 1.58114 -1.26 0.2415
X 1 5.00000 1.00000 5.00 0.0011
OK, but what if we have more than two groups? Show me that the ANOVA is a regression analysis in that case.
Here is the SAS program, with data:
data Lotus;
input Dose N; Do I=1 to N; Input Illness @@; output; end;
cards;
0 20
101 101 101 104 104 105 110 111 111 113 114 79 89 91 94 95 96 99 99 99
10 20
100 65 65 67 68 80 81 82 85 87 87 88 88 91 92 94 95 94 96 96
20 20
64 75 75 76 77 79 79 80 80 81 81 81 82 83 83 85 87 88 90 96
30 20
100 105 108 80 82 85 87 87 87 89 90 90 92 92 92 95 95 97 98 99
40 20
101 102 102 105 108 109 112 119 119 123 82 89 92 94 94 95 95 97 98 99
*****************************************************************************;
proc GLM data=Lotus; class Dose;
model Illness = Dose / ss1;
title 'Here we have a traditional one-way independent samples ANOVA'; run;
*****************************************************************************;
data Polynomial; set Lotus; Quadratic=Dose*Dose; Cubic=Dose**3; Quartic=Dose**4;
proc GLM data=Polynomial; model Illness = Dose Quadratic Cubic Quartic / ss1;
title 'Here we have a polynomial regression analysis.'; run;
*****************************************************************************
Here is the output:
Here we have a traditional one-way independent samples ANOVA 2
The GLM Procedure
Dependent Variable: Illness
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 6791.54000 1697.88500 20.78 <.0001
Error 95 7762.70000 81.71263
Corrected Total 99 14554.24000
R-Square Coeff Var Root MSE Illness Mean
0.466637 9.799983 9.039504 92.24000
Source DF Type I SS Mean Square F Value Pr > F
Dose 4 6791.540000 1697.885000 20.78 <.0001
------
Here we have a polynomial regression analysis. 3
The GLM Procedure
Number of observations 100
------
Here we have a polynomial regression analysis. 4
The GLM Procedure
Dependent Variable: Illness
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 6791.54000 1697.88500 20.78 <.0001
Error 95 7762.70000 81.71263
Corrected Total 99 14554.24000
Note that the polynomial regression produced exactly the same F, p, SS, MS, as the traditional ANOVA.
R-Square Coeff Var Root MSE Illness Mean
0.466637 9.799983 9.039504 92.24000
Source DF Type I SS Mean Square F Value Pr > F
Dose 1 174.845000 174.845000 2.14 0.1468
Quadratic 1 6100.889286 6100.889286 74.66 <.0001
Cubic 1 389.205000 389.205000 4.76 0.0315
Quartic 1 126.600714 126.600714 1.55 0.2163
------
Return to Wuensch’s Stats Lessons Page
November, 2006