The purpose of this assignment is to refresh your memory with regard to interpreting the Pearson correlation coefficient. While this technique is normally used with variables that are measured at the interval level, there is an associated technique, the point biserial correlation, which is used when one of the variables is dichotomous. When the variables involved are described as dependent and independent, point biserial is used when the latter is a true dichotomy like gender. Lets look at the Pearson correlation first:
Although correlation is a symmetric procedure and computationally does not reflect independence-dependence, lets use that language here. In the tables below, education, in years, is the independent variable and income in dollars is the dependent variable.
The N is acceptably large so no worry here. The mean level of education is 13.49 years, or about 1 ½ years into college with a S.D. of almost 3 years. Does the latter seem small or large to you? Take a look at income with a mean of $40,445 and a S.D. of $29,631. What do you think about this S.D.? What might have produced such a relatively large S.D.? Here is the frequency run for just the income variable:
Does this help you at all in interpreting the standard deviation?
Moving on, we see that the correlation is a +.367. Recalling that a correlation coefficient can range from 0-1, does this seem large or small to you?
If we square this number and multiply the result by 100, we get the % of variance (dispersion of scores – recall what we said about using the mean as a best effort guess of a single score) explained in one variable as a function of the other. In this case, about 13.5% of the variance in income is explained by our knowledge of educational level. Since there is a possible 100% of the variance to explain, how have we done here?
What’s left is to determine whether or not the .367 is significantly different from 0 to convince us that this finding is not due to chance. In other words, can we reject the null hypothesis? When you run this for the assignment, be sure to consider if you want to do a one or two-tailed test of significance. Remember, a two-tail test says you do not wish to predict the direction. In this case I chose a one-tailed test because my hypothesis states a direction between educational level and income. The table above shows that in this case, a coefficient of .367 is significant at the .001 level (at least at this level because SPSS shows .000), so we can reject the null hypothesis.
Lets move on to using a dichotomous variable. For educational level, lets substitute gender for educational level and look at personal income for family income.
Lets look at the descriptives first. The variable rincom91 is coded, so here is the breakdown:
The mean for the overall sample is 12.8, or close to the $19,999 level. If we ask SPSS to breakdown rincom91 by sex we get the following: Men 14.14 Women 11.55
The variance of rincom91 is 31.60. So, the question is, how much of this variance can e reduce by knowing the sex of the respondent? SPSS does not computer a point-biserial correlation, but statisticians recommend simply using the Pearson correlation in this case and especially with a large sample, the results will be very close to that which would have resulted from a point-biserial calculation. If we square the correlation coefficient, -.231 and multiply by 100 we get about 5.3%. So, of the total variance possible to explain, 100%, we explained 5.3% knowing gender. The variance, or squared error, of rincom91 can be reduced from 31.60 to 29.93 because of gender.
Now, lets look at the sign of the coefficient. In point-biserial correlation, the interpretation is somewhat analogous to trying to predict level of income knowing someone’s membership in two groups, here male and female. Remember that for sex the codes are 1=male, 2=female. The sign is negative and so low values of income are associated with membership in the group with the high number, here females. However, you have to be very cautious about this interpretation because the magnitude of the coefficient is quite low. I leave the test of significance to you.
1