Dr. Lakey – Diekhoff (1996) – Chapter 10

CORRELATION

“Co-Variation” “Linked” “Go-Together”
“Relationship” “Association”

Many Different Indices of Correlation:

Pearson’s r, Spearman’s r, Cramer’s V,

the Correlation Ratio, Eta, etc.

Different Correlation Coefficients

for

  • Nominal, Ordinal, Interval, and Ratio Data
  • Linear and Curvilinear Relationships
  • Continuous and Dichotomous Variables
  • Two Variables (Bivariate r)

or Three or More Variables (Multivariate R)

Also
  • “Part” Multivariate Correlation (the correlation of one X variables with Y without the other X’s)
  • “Partial” Multivariate Correlation (the correlation of all the X variables with Y without one X that is held constant)
  • If Reliable Correlation, then can Predict Y from X (or X’s):

Regression Analysis (next chapter)
10.1 Pearson Product-Moment Correlation

rp the most widely used correlation coefficient or index…

For Basic Understanding:

(1)See Table 10.1 (p305): The Paired Data Set:

Each Case/Subject (numbered) has Two Scores on Two Different Measurements/Tests

(2) See Figure 10.1 (p305): The Scatterplot:

Each Point (numbered) is that One Case/Subject showing One Score on the X-Axis and showing the Other Paired Score on the Y-Axis.

Always Look at the Scatterplot! Look for the…

  • Direction of the Relationship

Positive Correlation: As X increases, Y increases.

Negative Correlation: As X increases, Y decreases.

  • Strength of the Relationship

Scatterof Points about the Best-Fitting Line.

Slope of the Best-Fitting Line.

Slope of One > Highest r (Fig. 10.3a).

Slope of Zero > Lowest r (Fig. 10.4 p309).

(Horizontal Line = Vertical Line = r = 0)

  • Linear or Curvilinear Relationships

Pearson r only for Linear Relationships.

(Eta or other for Curvilinear Relationships).

Fig. 10.5: rp should not be used!

(under-estimates the true correlation)

  • Homoscedasticity vs Hetroscedasticity

Constant Scatter about Best-Fitting Line

from One End to the Other.

See Fig. 10.6 (p310)

As scatter effects r, homoscedasticity means a

Constant r Across the Range of X-Y.

  • Outliers?

Outliers Disproportionately Effect r!

Usually data-entry errors…

If not, then may be an important exception!

10.2 Computing Pearson Correlation Coefficient

Computers and Calculators do the work nowdays…

Equation 10.1 (p312) with z-score transformed data.

rp = SUM(zXzY) / N

[Note 2 Equation (p336) with untransformed data]

rp = An Average of the Products of X and Y z-scores.

rp highest when X and Y z-scores agree in their differences from the mean.

Most Frequent Problems:

  • Restricted Variance = Restricted Range

Usually Lowers rp

  • Nonlinearity

Lowers rp

  • Outliers

Can Dramatically Lower or Raise rp

Statistical Significance

of rp

Is the correlation “real” (absoluterp > 0) or not (rp= 0)?

Sampling Distribution of rp

Fig. 10.9 (p318) and Table 8 (p417with df=N-2)

(Can also use t-Test re Note 3 Equation p336f)

10.3Spearman Rank-Order Correlation

rs (aka Spearman’s rho) for ordinal and/or monotonic nonlinear and/or nonnormal data…

Equation 10.2 (p323)?

Better to use Pearson’s Equation (Note 2 p336)!

Monotonic Nonlinear vs Non-Monotonic Curvilinear Relationships

Fig. 10.5 (p309)

No Inversion (a) vs Inversion (b) of Best-Fitting Function

(a) Monotonic: For every increase in X, there is an increase in Y. For every increase in X, there is a decrease in Y (no inversions).

(b) Non-Monotonic: For some increases in X, Y increases, and for other increases in X, Y decreases (inversions).

Transformations By Rank-Ordering Data

(a) Monotonic Nonlinear Data > Rank-Order Transformation > NowLinear!

(b) Non-Montonic Nonlinear Data > Rank-Order Transformation > Still Nonlinear!

Statistical Significance of rs

Example 10.2

Prof. Yogi Zen at Rocky Bottom State U.

Studies Self-Aggrandizement and Productivity

Equation 10.2 (p323)

[Better to use Pearson’s Equation (Note 2 p336)]

Hand Calculators with r-function keys Work!

rs = - .42

Table 9 (p418 with N = Npairs)

[Without Table 9 Can also use t Test (Note 6 & 3 p337)]

APA Results Section: “The predicted correlation was between Self-Aggrandizement and Scholarly Productivity was not significant [Spearman’s r (10) = -.42, p > .05].”

10.4Chi-Square Test of Association and Cramer’s V Revisited

For Nominal Data…

Cramer’s V simply re-scales Chi2(and phi) to limit range from 0 to 1 (for all NxN Tables)

Revisited from Chapter 7

Recall Lab Assignment 10 in

Pavkov & Pierce (2000):

SPSS Crosstabs Procedure.

ContingencyTables

Table 10.4 (p327)

Example 10.3

Connum, Duppum, and Lie Marketing

Age Group and Preferred Cola

(1)Construct Contingency Table of fo (p332).

(2)Compute ( fe ) for each Cell:

(RowTotal / NTotal) x ColumnTotal .

(3) Check for Smallfe

  • 2x2 Table: All ( fe ) at least 5?
  • Larger Table: 80% of ( fe ) at least 5?

Only one cell < 5, so 88% OK!

(4) Compute Chi2 (Equation 7.7 p211)

Chi2 = SUM(fo - fe)2 / fe

Chi2 = 41.75

(5)Assess Chi2with Table 3

with df = (R-1)(C-1)= 4: p < .01**

(6) Compute Cramer’s V (Equation 7.10 p218)

Cramer’s V = [Chi2/N(n-1)]1/2 where n is the smaller number of rows or columns

Cramer’s V = .56

Post-Hoc Tests?

APA Report: A significant correlation was found between preferred cola and age group [Chi2(4, N=67) = 41.75, p < .01, Cramer’s V = .56]. Follow-up pairwise Chi2 comparisions were not conducted (but should have been using Holm’s Bonferroni method to control for Type I errors). “It appears that the youngest consumer prefer Burpee Cola, the oldest prefer Same Ol’Sudge, and those of the intermediate age prefer Old Brown.”

The Correlation Ratio Eta

For Nonlinear or Possible Curvilinear Relationships…

Eta = (SSgroups / SStotal)1/2

10.5Correlation and Causation

Ice Cream and Drowning?

Smoking and Colds?

If X causes Y, then X and Y are correlated.

If X and Y are correlated, then one may cause the other.

Causal connections are established by conducting experiments: Random assignment to groups with manipulation of the independent variable for each group. Can assess either significant differences or significant correlation!

Patients randomly assigned to different drug dosage (X) groups and their depression (Y) subsequently measured: a significant correlation would indicate a causal connection between the drug and depression.

The reverse is also true: If Smoking and Colds are significantly correlated, the mean colds experienced by Smoking and Non-Smoking groups will be significantly different! (Smoking is a “subject variable” not a manipulated “experimental” variable).

10.6Statistical vs. Practical Significance of Correlation

With large samples,

very small r are significant.

Usually,

correlation coefficients of .10, .30 and .50, irrespective of sign, are interpreted

as “small,” “medium,” and “large” coefficients, respectively.