Simulating Data For A Discriminant Function Analysis[(]

Start out with the desired intercorrelation matrix. One may employ a correlation matrix obtained from actual research or may contrive one. In the latter case, one should keep in mind that one can contrive a correlation matrix that is impossible -- it is safest to use a correlation matrix that has been obtained with actual data. I often use a correlation matrix computed on data from some of my research or a matrix obtained from the published research of others. For the example below, I have used contrived correlations among students’ logical thinking ability, creativity, IQ, and political liberalism, all standardized to mean 0, variance 1.

options pageno=min nodate formdlim='-';

DATA DFA(TYPE=CORR);

LENGTH _NAME_ $ 12;

*Specify the length of the longest variable name, in this case, Z-Creativity, 12;

INPUT _TYPE_ $ _NAME_ $ Z_Logic Z_Creativity Z_IQ Z_Liberalism; cards;

CORR Z_Logic 1.00 0.28 0.38 0.32

CORR Z_Creativity 0.28 1.00 0.44 0.29

CORR Z_IQ 0.38 0.44 1.00 0.33

CORR Z_Liberalism 0.32 0.29 0.33 1.00

PROC REG;

A: MODEL Z_Liberalism=Z_Logic--Z_IQ;

B: MODEL Z_IQ=Z_Creativity Z_Logic;

C: MODEL Z_Creativity=Z_Logic;

run;

The SAS System 1

The REG Procedure

Model: A

Dependent Variable: Z_Liberalism

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 3 1710.77405 570.25802 687.76 <.0001

Error 9996 8288.22595 0.82915

Corrected Total 9999 9999.00000

Root MSE 0.91058 R-Square 0.1711

Dependent Mean 0 Adj R-Sq 0.1708

Coeff Var .

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0 0.00911 0.00 1.0000

Z_Logic 1 0.20760 0.00994 20.89 <.0001

Z_Creativity 1 0.15052 0.01024 14.71 <.0001

Z_IQ 1 0.18488 0.01062 17.40 <.0001

------

The SAS System 2

The REG Procedure

Model: B

Dependent Variable: Z_IQ

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 2 2651.29734 1325.64867 1803.63 <.0001

Error 9997 7347.70266 0.73499

Corrected Total 9999 9999.00000

Root MSE 0.85732 R-Square 0.2652

Dependent Mean 0 Adj R-Sq 0.2650

Coeff Var .

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0 0.00857 0.00 1.0000

Z_Creativity 1 0.36198 0.00893 40.53 <.0001

Z_Logic 1 0.27865 0.00893 31.20 <.0001

------

The SAS System 3

The REG Procedure

Model: C

Dependent Variable: Z_Creativity

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 783.92160 783.92160 850.52 <.0001

Error 9998 9215.07840 0.92169

Corrected Total 9999 9999.00000

Root MSE 0.96005 R-Square 0.0784

Dependent Mean 0 Adj R-Sq 0.0783

Coeff Var .

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0 0.00960 0.00 1.0000

Z_Logic 1 0.28000 0.00960 29.16 <.0001

Now we use the highlighted values from the output above to simulate data for the groups in our discriminant function analysis. The pattern of correlations among the simulated variables will be the same within each group (ignoring the effects of sampling error). I enclosed in a macro (%zz) the commands to do the simulation of standardized scores and then called it for each major. I changed the means (excepting IQ) from major to major, but kept the variances stable. I included invocations of Proc Corr and Prog Discrim to conduct the initial analysis of the data. Following the program is the first five lines of the resulting data file and selected parts of the statistical output.

options pageno=min nodate formdlim='-';

%macro zz;

Z_Logic=normal(0);

Z_Creativity=.28*Z_Logic+.96*normal(0);

Z_IQ=.362*Z_Creativity + .279*Z_Logic + .857*normal(0);

Z_Liberalism=.208*Z_Logic+.151*Z_Creativity+.185*Z_IQ+.911*normal(0);

%mend zz;

DATA DFA;

LENGTH MAJOR $ 10;

*Specify the length of the longest value for variable Major;

Major = 'Art'; do s=1 to 100; %zz

Logic=round(500+70*Z_Logic);

Creativity=round(60+10*Z_Creativity);

IQ=round(115+8*Z_IQ);

Liberalism=round(1 + 1.2*Z_Liberalism);

OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;

Major = 'Psychology'; do s=1 to 100; %zz

Logic=round(550+70*Z_Logic);

Creativity=round(50+10*Z_Creativity);

IQ=round(115+8*Z_IQ);

Liberalism=round(.5 + 1.2*Z_Liberalism);

OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;

Major = 'Scatology'; do s=1 to 100; %zz

Logic=round(600+70*Z_Logic);

Creativity=round(40+10*Z_Creativity);

IQ=round(115+8*Z_IQ);

Liberalism=round(0 + 1.2*Z_Liberalism);

OUTPUT; FILE 'E:\SimData\DFA.dat'; PUT Major Logic--Liberalism; END;

proc corr; var Logic -- Liberalism; by Major;

proc discrim anova canonical;

class Major; var Logic -- Liberalism;

run;

Art 480 77 127 0

Art 543 53 120 2

Art 615 57 115 2

Art 368 61 114 -1

Art 543 60 136 1

The SAS System 1

------MAJOR=Art ------

The CORR Procedure

4 Variables: Logic Creativity IQ Liberalism

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

Logic 100 503.85000 66.48587 50385 359.00000 673.00000

Creativity 100 59.00000 10.34359 5900 31.00000 81.00000

IQ 100 113.85000 7.99037 11385 95.00000 136.00000

Liberalism 100 0.89000 1.23005 89.00000 -2.00000 3.00000

Pearson Correlation Coefficients, N = 100

Prob > |r| under H0: Rho=0

Logic Creativity IQ Liberalism

Logic 1.00000 0.16593 0.21738 0.28919

0.0990 0.0298 0.0035

Creativity 0.16593 1.00000 0.49950 0.23579

0.0990 <.0001 0.0182

IQ 0.21738 0.49950 1.00000 0.34362

0.0298 <.0001 0.0005

Liberalism 0.28919 0.23579 0.34362 1.00000

0.0035 0.0182 0.0005

------

The SAS System 2

------MAJOR=Psychology ------

The CORR Procedure

4 Variables: Logic Creativity IQ Liberalism

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

Logic 100 544.65000 69.52964 54465 408.00000 736.00000

Creativity 100 49.08000 10.33253 4908 14.00000 71.00000

IQ 100 116.00000 8.52684 11600 96.00000 133.00000

Liberalism 100 0.51000 1.28311 51.00000 -3.00000 4.00000

Pearson Correlation Coefficients, N = 100

Prob > |r| under H0: Rho=0

Logic Creativity IQ Liberalism

Logic 1.00000 0.29143 0.53782 0.44744

0.0033 <.0001 <.0001

Creativity 0.29143 1.00000 0.54240 0.23384

0.0033 <.0001 0.0192

IQ 0.53782 0.54240 1.00000 0.32960

<.0001 <.0001 0.0008

Liberalism 0.44744 0.23384 0.32960 1.00000

<.0001 0.0192 0.0008

------

The SAS System 3

------MAJOR=Scatology ------

The CORR Procedure

4 Variables: Logic Creativity IQ Liberalism

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

Logic 100 612.86000 65.30875 61286 478.00000 779.00000

Creativity 100 40.14000 9.93415 4014 15.00000 67.00000

IQ 100 115.93000 7.04711 11593 98.00000 132.00000

Liberalism 100 0.19000 1.24475 19.00000 -3.00000 3.00000

Pearson Correlation Coefficients, N = 100

Prob > |r| under H0: Rho=0

Logic Creativity IQ Liberalism

Logic 1.00000 0.13674 0.32258 0.27456

0.1749 0.0011 0.0057

Creativity 0.13674 1.00000 0.54496 0.23472

0.1749 <.0001 0.0187

IQ 0.32258 0.54496 1.00000 0.36541

0.0011 <.0001 0.0002

Liberalism 0.27456 0.23472 0.36541 1.00000

0.0057 0.0187 0.0002

------

The DISCRIM Procedure

Univariate Test Statistics

F Statistics, Num DF=2, Den DF=297

Total Pooled Between

Standard Standard Standard R-Square

Variable Deviation Deviation Deviation R-Square / (1-RSq) F Value Pr > F

Logic 80.6570 67.1316 55.0763 0.3119 0.4533 67.31 <.0001

Creativity 12.7665 10.2052 9.4342 0.3653 0.5755 85.46 <.0001

IQ 7.9155 7.8786 1.2216 0.0159 0.0162 2.40 0.0921

Liberalism 1.2811 1.2528 0.3504 0.0500 0.0527 7.82 0.0005

------

The SAS System 7

The DISCRIM Procedure

Canonical Discriminant Analysis

Adjusted Approximate Squared

Canonical Canonical Standard Canonical

Correlation Correlation Error Correlation

1 0.780395 0.777526 0.022611 0.609016

2 0.158556 0.138597 0.056378 0.025140

Test of H0: The canonical correlations in

the current row and all

Eigenvalues of Inv(E)*H that follow are zero

= CanRsq/(1-CanRsq)

Likelihood Approximate

Eigenvalue Difference Proportion Cumulative Ratio F Value Num DF Den DF Pr > F

1 1.5577 1.5319 0.9837 0.9837 0.38115453 45.55 8 588 <.0001

2 0.0258 0.0163 1.0000 0.97486008 2.54 3 295 0.0570

------

The SAS System 8

The DISCRIM Procedure

Canonical Discriminant Analysis

Pooled Within Canonical Structure

Variable Can1 Can2

Logic 0.534605 -0.559819

Creativity -0.607345 -0.189969

IQ 0.086242 0.422544

Liberalism -0.183583 -0.085251

------

Class Means on Canonical Variables

MAJOR Can1 Can2

Art -1.511826518 -0.114979667

Psychology -0.017970750 0.225954333

Scatology 1.529797267 -0.110974666

------

Number of Observations and Percent Classified into MAJOR

From MAJOR Art Psychology Scatology Total

Art 74 24 2 100

74.00 24.00 2.00 100.00

Psychology 24 52 24 100

24.00 52.00 24.00 100.00

Scatology 1 20 79 100

1.00 20.00 79.00 100.00

Total 99 96 105 300

33.00 32.00 35.00 100.00

Priors 0.33333 0.33333 0.33333

Required Research Presentation in PSYC 7433

Copyright 2009, Karl L. Wuensch - All rights reserved.

[(]ã Copyright 2009, Karl L. Wuensch - All rights reserved.