Simulating Data For A Discriminant Function Analysis[(]
Start out with the desired intercorrelation matrix. One may employ a correlation matrix obtained from actual research or may contrive one. In the latter case, one should keep in mind that one can contrive a correlation matrix that is impossible -- it is safest to use a correlation matrix that has been obtained with actual data. I often use a correlation matrix computed on data from some of my research or a matrix obtained from the published research of others. For the example below, I have used contrived correlations among students’ logical thinking ability, creativity, IQ, and political liberalism, all standardized to mean 0, variance 1.
options pageno=min nodate formdlim='-';
DATA DFA(TYPE=CORR);
LENGTH _NAME_ $ 12;
*Specify the length of the longest variable name, in this case, Z-Creativity, 12;
INPUT _TYPE_ $ _NAME_ $ Z_Logic Z_Creativity Z_IQ Z_Liberalism; cards;
CORR Z_Logic 1.00 0.28 0.38 0.32
CORR Z_Creativity 0.28 1.00 0.44 0.29
CORR Z_IQ 0.38 0.44 1.00 0.33
CORR Z_Liberalism 0.32 0.29 0.33 1.00
PROC REG;
A: MODEL Z_Liberalism=Z_Logic--Z_IQ;
B: MODEL Z_IQ=Z_Creativity Z_Logic;
C: MODEL Z_Creativity=Z_Logic;
run;
The SAS System 1
The REG Procedure
Model: A
Dependent Variable: Z_Liberalism
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 1710.77405 570.25802 687.76 <.0001
Error 9996 8288.22595 0.82915
Corrected Total 9999 9999.00000
Root MSE 0.91058 R-Square 0.1711
Dependent Mean 0 Adj R-Sq 0.1708
Coeff Var .
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0 0.00911 0.00 1.0000
Z_Logic 1 0.20760 0.00994 20.89 <.0001
Z_Creativity 1 0.15052 0.01024 14.71 <.0001
Z_IQ 1 0.18488 0.01062 17.40 <.0001
------
The SAS System 2
The REG Procedure
Model: B
Dependent Variable: Z_IQ
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 2651.29734 1325.64867 1803.63 <.0001
Error 9997 7347.70266 0.73499
Corrected Total 9999 9999.00000
Root MSE 0.85732 R-Square 0.2652
Dependent Mean 0 Adj R-Sq 0.2650
Coeff Var .
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0 0.00857 0.00 1.0000
Z_Creativity 1 0.36198 0.00893 40.53 <.0001
Z_Logic 1 0.27865 0.00893 31.20 <.0001
------
The SAS System 3
The REG Procedure
Model: C
Dependent Variable: Z_Creativity
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 783.92160 783.92160 850.52 <.0001
Error 9998 9215.07840 0.92169
Corrected Total 9999 9999.00000
Root MSE 0.96005 R-Square 0.0784
Dependent Mean 0 Adj R-Sq 0.0783
Coeff Var .
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0 0.00960 0.00 1.0000
Z_Logic 1 0.28000 0.00960 29.16 <.0001
Now we use the highlighted values from the output above to simulate data for the groups in our discriminant function analysis. The pattern of correlations among the simulated variables will be the same within each group (ignoring the effects of sampling error). I enclosed in a macro (%zz) the commands to do the simulation of standardized scores and then called it for each major. I changed the means (excepting IQ) from major to major, but kept the variances stable. I included invocations of Proc Corr and Prog Discrim to conduct the initial analysis of the data. Following the program is the first five lines of the resulting data file and selected parts of the statistical output.
options pageno=min nodate formdlim='-';
%macro zz;
Z_Logic=normal(0);
Z_Creativity=.28*Z_Logic+.96*normal(0);
Z_IQ=.362*Z_Creativity + .279*Z_Logic + .857*normal(0);
Z_Liberalism=.208*Z_Logic+.151*Z_Creativity+.185*Z_IQ+.911*normal(0);
%mend zz;
DATA DFA;
LENGTH MAJOR $ 10;
*Specify the length of the longest value for variable Major;
Major = 'Art'; do s=1 to 100; %zz
Logic=round(500+70*Z_Logic);
Creativity=round(60+10*Z_Creativity);
IQ=round(115+8*Z_IQ);
Liberalism=round(1 + 1.2*Z_Liberalism);
OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;
Major = 'Psychology'; do s=1 to 100; %zz
Logic=round(550+70*Z_Logic);
Creativity=round(50+10*Z_Creativity);
IQ=round(115+8*Z_IQ);
Liberalism=round(.5 + 1.2*Z_Liberalism);
OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;
Major = 'Scatology'; do s=1 to 100; %zz
Logic=round(600+70*Z_Logic);
Creativity=round(40+10*Z_Creativity);
IQ=round(115+8*Z_IQ);
Liberalism=round(0 + 1.2*Z_Liberalism);
OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;
proc corr; var Logic -- Liberalism; by Major;
proc discrim anova canonical;
class Major; var Logic -- Liberalism;
run;
Art 480 77 127 0
Art 543 53 120 2
Art 615 57 115 2
Art 368 61 114 -1
Art 543 60 136 1
The SAS System 1
------MAJOR=Art ------
The CORR Procedure
4 Variables: Logic Creativity IQ Liberalism
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Logic 100 503.85000 66.48587 50385 359.00000 673.00000
Creativity 100 59.00000 10.34359 5900 31.00000 81.00000
IQ 100 113.85000 7.99037 11385 95.00000 136.00000
Liberalism 100 0.89000 1.23005 89.00000 -2.00000 3.00000
Pearson Correlation Coefficients, N = 100
Prob > |r| under H0: Rho=0
Logic Creativity IQ Liberalism
Logic 1.00000 0.16593 0.21738 0.28919
0.0990 0.0298 0.0035
Creativity 0.16593 1.00000 0.49950 0.23579
0.0990 <.0001 0.0182
IQ 0.21738 0.49950 1.00000 0.34362
0.0298 <.0001 0.0005
Liberalism 0.28919 0.23579 0.34362 1.00000
0.0035 0.0182 0.0005
------
The SAS System 2
------MAJOR=Psychology ------
The CORR Procedure
4 Variables: Logic Creativity IQ Liberalism
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Logic 100 544.65000 69.52964 54465 408.00000 736.00000
Creativity 100 49.08000 10.33253 4908 14.00000 71.00000
IQ 100 116.00000 8.52684 11600 96.00000 133.00000
Liberalism 100 0.51000 1.28311 51.00000 -3.00000 4.00000
Pearson Correlation Coefficients, N = 100
Prob > |r| under H0: Rho=0
Logic Creativity IQ Liberalism
Logic 1.00000 0.29143 0.53782 0.44744
0.0033 <.0001 <.0001
Creativity 0.29143 1.00000 0.54240 0.23384
0.0033 <.0001 0.0192
IQ 0.53782 0.54240 1.00000 0.32960
<.0001 <.0001 0.0008
Liberalism 0.44744 0.23384 0.32960 1.00000
<.0001 0.0192 0.0008
------
The SAS System 3
------MAJOR=Scatology ------
The CORR Procedure
4 Variables: Logic Creativity IQ Liberalism
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Logic 100 612.86000 65.30875 61286 478.00000 779.00000
Creativity 100 40.14000 9.93415 4014 15.00000 67.00000
IQ 100 115.93000 7.04711 11593 98.00000 132.00000
Liberalism 100 0.19000 1.24475 19.00000 -3.00000 3.00000
Pearson Correlation Coefficients, N = 100
Prob > |r| under H0: Rho=0
Logic Creativity IQ Liberalism
Logic 1.00000 0.13674 0.32258 0.27456
0.1749 0.0011 0.0057
Creativity 0.13674 1.00000 0.54496 0.23472
0.1749 <.0001 0.0187
IQ 0.32258 0.54496 1.00000 0.36541
0.0011 <.0001 0.0002
Liberalism 0.27456 0.23472 0.36541 1.00000
0.0057 0.0187 0.0002
------
The DISCRIM Procedure
Univariate Test Statistics
F Statistics, Num DF=2, Den DF=297
Total Pooled Between
Standard Standard Standard R-Square
Variable Deviation Deviation Deviation R-Square / (1-RSq) F Value Pr > F
Logic 80.6570 67.1316 55.0763 0.3119 0.4533 67.31 <.0001
Creativity 12.7665 10.2052 9.4342 0.3653 0.5755 85.46 <.0001
IQ 7.9155 7.8786 1.2216 0.0159 0.0162 2.40 0.0921
Liberalism 1.2811 1.2528 0.3504 0.0500 0.0527 7.82 0.0005
------
The SAS System 7
The DISCRIM Procedure
Canonical Discriminant Analysis
Adjusted Approximate Squared
Canonical Canonical Standard Canonical
Correlation Correlation Error Correlation
1 0.780395 0.777526 0.022611 0.609016
2 0.158556 0.138597 0.056378 0.025140
Test of H0: The canonical correlations in
the current row and all
Eigenvalues of Inv(E)*H that follow are zero
= CanRsq/(1-CanRsq)
Likelihood Approximate
Eigenvalue Difference Proportion Cumulative Ratio F Value Num DF Den DF Pr > F
1 1.5577 1.5319 0.9837 0.9837 0.38115453 45.55 8 588 <.0001
2 0.0258 0.0163 1.0000 0.97486008 2.54 3 295 0.0570
------
The SAS System 8
The DISCRIM Procedure
Canonical Discriminant Analysis
Pooled Within Canonical Structure
Variable Can1 Can2
Logic 0.534605 -0.559819
Creativity -0.607345 -0.189969
IQ 0.086242 0.422544
Liberalism -0.183583 -0.085251
------
Class Means on Canonical Variables
MAJOR Can1 Can2
Art -1.511826518 -0.114979667
Psychology -0.017970750 0.225954333
Scatology 1.529797267 -0.110974666
------
Number of Observations and Percent Classified into MAJOR
From MAJOR Art Psychology Scatology Total
Art 74 24 2 100
74.00 24.00 2.00 100.00
Psychology 24 52 24 100
24.00 52.00 24.00 100.00
Scatology 1 20 79 100
1.00 20.00 79.00 100.00
Total 99 96 105 300
33.00 32.00 35.00 100.00
Priors 0.33333 0.33333 0.33333
Required Research Presentation in PSYC 7433
Copyright 2009, Karl L. Wuensch - All rights reserved.
[(]ã Copyright 2009, Karl L. Wuensch - All rights reserved.