What Are the Assumptions of Multiple Regression

Regression Midterm Review

Multiple Regression Midterm Review

Nadra Lisha & Alexis Alabastro (edits by DB)

What is rxy?

r is a the correlation between x and y. It ranges from -1 to 1, indicating the extent to which two variables are linearly related. It is also referred to as Pearson’s product moment correlation coefficient.

What is( rxy)²?

r² is the proportion of variance of y explained by x with a linear model. In other words, how much of the variation observed in your ‘criterion’ variable (y) can be predicted with a linear model using the ‘predictor’ variable (x)?

What is “rho” = ρ?

Rho the “true” population correlation. It ranges from -1 to 1.

What is Regression?

Regression is a statistical analysis where the value on one variable (Y) is predicted based on one or more other measured variables. Regression can be used for prediction, inference, hypothesis testing, and modeling of causal relationships.

What does a regression equation look like?

Ŷi = a + bxi

Ŷi = predicted value of y for a case (i) where the value on variable X is xi

a = constant

b = unstandardized ‘b weight’ = ( rxy ) * ((sy) / (sx)) in the case of a single predictor

By constructing this formula, we are able to define a line that minimizes the sum of the squared error in prediction of y summed over every observed xi. In regression, error represents unexplained variation in the dependent variable.

What are the assumptions of Regression?

Normal distribution of residuals around the regression line
Test: SPSS creates temporary variables that contain values from the standardized predicted scores from the regression (ZPred) and the standardized resisduals from the regression (ZResid). Look at a plot of the residuals (with zpred on x-axis, and zresid on y-azis). The points should be normally distributed around a horizontal line at zero for all values of zpred. You can ask SPSS for a cumulative histogram of all residuals.
Homoscedasticity
Definition: variance/error around the regression line is equal for all zpred.
Test: look at a plot of the residuals. The points should be similarly scattered around the line at different values along the x-axis
This assumption is most important when using the regression equation to predict a single score.
Independent observations
Test: this comes from your research design. Are there repeated measures? Are any data points paired or yoked? We assume they are independent.
(Normal distribution of the Dependent Variable)
*Desirable, but not required. It’s important because it is likely that residuals will not be normally distributed if the DV is far from normal. Also, the size of a correlation is limited if two variables have distributions with different shapes.
Test: look at a histogram of the DV and each predictor. Be alert for outliers that may distort the distribution shape and be unduly influential

What if one or more of the assumptions are not met?

Look at your data to determine what’s most appropriate. Are there outliers? Is there a problem with one of your predictor variables? Consider transforming your data or restricting your analyses to variables within a limited range, which might reduce the impact of violations. In general, the larger your sample, the less of a negative impact these violations should have.

When would I need to transform my data?

Say for instance, your sample data is very positively skewed. A log or square root transformation may help a lot. Consider doing the transform before deleting or Winsorizing outliers because the transformation may bring the outlier into the distribution.

Why would I want to do a Fisher’s transformation?

Fisher's r aka r′ aka Zr = ½ [ln(1+r) - ln(1-r)]

The sampling distribution of r is skewed when rho is not equal to zero. The sampling distribution for r′ is normally shaped. This way, you can do significance tests that require normal distributions.

Example where Fisher’s transformation is useful

In lecture we used an example where we wished to test the hypothesis that the population correlation ϱ = .80 and wanted to test to see if we had statistically significant evidence that our sample correlation, r = .60 came from a population where the correlation was not .80. Our sampling distribution for r if Ho is true is negatively skewed (centered around .80, with an upper limit of +1 and lower limit of -1). We transformed r and rho to obtain a more normal sampling distribution. This allowed us to run a z-test to see if our sample correlation came from our hypothesized population correlation.

z-test: [ r′ – ρ′ ] / σr′

error term = σr′ = 1 / √ (n – 3)

How could we construct a confidence interval for the population correlation?

You can use a confidence interval to describe the precision of your estimate of the population correlation. Begin by constructing a confidence interval for ρ′.

r′ ± (Z α/2) (1 / √ (n-3))

To report this result, you should convert the limits back from r′ to r. You can use StatWISE or apply the formulas given in Howell and in class.

If you are collecting correlations from multiple groups, how can you tell if they are similar enough to pool them together to estimate the true population correlation?

Ho : ρ1 = ρ2 = ρ3 =…= ρk

You would use a Chi-Square test as described in class. If your χ² is statistically significant (you reject the Null Hypothesis), then you have evidence that your samples did not come from the same population correlation. If you fail to reject the null hypothesis, it may be reasonable to pool the estimates of the population correlation, as they are not drastically different from each other. Caution: with small samples it is harder to detect differences, so you should also use logic and consider how the data were collected.

How do you pool correlations?

Use the r′ avg formula. Make sure to transform back to r. Once you pool these groups into a single r, now you can do a z-test or a t-test to determine if this correlation is significantly different from zero or not (Here, the Null is ρ = 0).

What is power?

Power indicates to the probability that you will detect an effect that is actually there. You need to specify any three of the following terms to compute the fourth.

B = Beta = [1 – power]

E = effect size

A = alpha (do you use α=.05? .01?)

N = sample size

What are residuals?

Residuals are the part of y that cannot be predicted by, accounted for, or explained by x, using a linear model. It is what is unknown.

How are residuals useful?

Residuals can be used to measure how much variance of a variable cannot be explained by one or more of the other variables. For instance, if you use age to predict vocabulary for a sample of elementary school children, the error in prediction is the residual, that part of vocabulary that cannot be predicted by age. We can use that residual where we are interested in age-adjusted vocabulary as a predictor. With this residual variable, we have ‘partialled out’ the effects of age on vocabulary. If we partial out the effects of age on a math score, we now could use the correlation between the two partial scores to examine the relationship between math score and verbal score independent of or “controlling for” or “removing” or “holding constant” the effects of age.

What is a partial correlation?

A partial correlation is what we get when we remove variance associated with some third variable from the two variables to be correlated, as described above. In other words, it is a correlation between two variables when the effects of one or more related variables are removed from both variables. A partial correlation is computed between two residuals. In the example above the partial correlation gives us an estimate of the correlation between vocabulary and math for children of any one given age. In that sense, we are ‘holding constant’ the effects of age, or removing variance associated with age.

What is a semi-partial correlation?

If we were interested in predicting verbal scores (Y) we might begin with age (X1) as our first predictor, and then ask whether math scores (X2) improve the prediction beyond age (X1). In symbolic terms, we are interested in the relationship between Y and that part of X2 that is independent of X1. This can be measured with the correlation between Y and the residual part of X2 where X1 was used to predict X2. This is a semi-partial correlation because X1 is partialed out of X2 but not out of Y. In hierarchical regression, the R squared change for a variable X2 after X1 has been entered is the semi-partial correlation squared.

What is R?

R is a correlation between a criterion variable and a weighted combination of predictors.

R can assume values between 0 and 1.

What is R² ?

R² is how much total variance of a criterion variable is accounted for by the predictor variables using the optimally weighted combination of predictors. R² indicates the proportion of variance accounted for by verbal and age in predicting math score. Note that this is an overall measure of the strength of association, and does not reflect the extent to which either individual independent variable is associated with the dependent variable. R-square is also called the ‘coefficient of determination.’

What is R² added aka ∆ R² aka R² change?

R² change is a measure of contribution of a new variable (or set of variables) above and beyond what was explained by the initial variable(s). For instance, you can compute the R² (contribution) of age in predicting math score. You can then look at how much variance the verbal score contributes to your ability to predict math score, above and beyond age – this is indicated by your R² change value (which is also a semi-partial r squared).

What is adjusted or shrunken R² ?

We expect to see some relationship between predictors and the DV due to chance alone in any given sample. The adjusted R² takes this into account and shrinks the sample R² a little to give an estimate of the true R² in the population. If shrunken R2 is negative, our best estimate of the population value is zero. In general, shrunken R squared is smaller when the sample size is smaller or the number of predictors is larger.

If the population correlations were all zero between all pairs of independent and dependent variables, and we took a random sample of n cases with k variables, the expected value of R squared that we would observe is k/(n-1).

What is stepwise regression?

Stepwise regression is the tool of the Devil.

Stepwise regression capitalizes on chance, giving models that may not be replicable with new data and may not be interpretable theoretically. Tests of statistical significance for stepwise modes given by SPSS are wrong – they don’t take into account how many variables were considered. Stepwise may be useful for hypothesis generation, but make appropriate adjustments if you wish to test statistical significance – Wilkinson’s tables can be useful in some cases.

Page 1