Multiple Regression Final Review: Spring 2009

Regression Final Review

SP09

Multiple Regression Final Review

Nadra Lisha & Alexis Alabastro (edited by Dale Berger)

What is Multiple Regression?

Multiple Regression is a way to analyze data by looking at the relationship between two or more independent, or “predictor” variables on their ability to predict the dependent or “criterion” variable of interest.

Whaaaat?

For instance, you may be interested in predicting exercise behaviors (your “criterion variable”) using information on eating habits (predictor variable X1), attitudes toward exercise (predictor variable X2), and BMI scores (predictor variable X3). Information on these predictor variables can be used to create a regression equation. The regression equation is a formula that can be used to estimate the value of your criterion variable, based on the values of your predictor variables. In this example, you could use a regression equation to predict how much someone will exercise if you know their eating habits, their attitudes toward exercise, and their BMI score.

So what does a regression equation look like?

Ŷ= A + B1*X1 + B2*X2 + B3*X3 ... + Bk*Xk

Ŷ = Your “best” estimate of the criterion variable (exercise behaviors), based the weighted combination of predictors as shown in the regression equation

A = Constant

X1= eating habits (where higher means healthier)

X2 = attitudes toward exercise (higher means more favorable)

X3 = BMI (Body Mass Index)

Bi = unstandardized regression coefficients, aka “B weights”

What is a Model Summary?

Predictors: (constant), eating habits, attitudes toward exercise, BMI

b.Model: Here we are considering all of the variables (eating habits, attitudes toward exercise, BMI) at once in their ability to predict the criterion variable (in this example, it is your “exercise behaviors”)

c.R: This value is the correlation between the weighted composite of all three predictor variables as described by the regression equation and the actual values on the criterion variable.

d.R²: This value is the proportion of the total variance in Y thatis accounted for by the weighted combination of the predictor variables.This value of .489 indicates that 48.9% of the variance in exercise behaviors can be predicted from the weighted combination of eating habits, attitudes toward exercise, and BMI. Note that this is an overall measure of the strength of association, and does not reflect the extent to which any particular independent variable is associated with the dependent variable. R-Square is also called the ‘coefficient of determination.’

e.Adjusted or Shrunken R2:

We expect to see some relationship between predictors and the DV due to chance alone in anygiven sample. The adjusted R2 takes this into account and shrinks the sample R2 a little to give an estimate of the true R2 in the population.

f.Standard Error of the Estimate: The standard error of the estimate, also called the root mean square error, is the standard deviation of the error term, and is the square root of the Mean Square Residual (or Error).

What is the difference between B weights and Beta weights?

B weights:

Are in original units of the Y variable per original units on X for prediction.
Determining importance: You cannot determine which variable is more important (i.e., explains more of the variance in the DV) by simply comparing the sizes of the different B weights. This is because the different B’s are affected by the SD of the X variables and the correlations among the predictors. To compare the sizes, look at Beta weights.
Interpretation: For the example where the B for predictor X = 4.5, you can interpret the B weight to say, "if everything was held constant except this one variable X, and we increase X by 1 unit, the predicted value of the DV would be increased by 4.5 units.

* Interpret the coefficients for main effects only in models that do not have interactions entered. It doesn’t make sense to interpret that part of a main effect that is independent of an interaction term where the interaction term is defined as the product of the main effects. With centered predictors the problem is greatly reduced.
The constant can be interpreted as the predicted value on Y when all of the predictors in the model are zero. When interpreting for Grandma, be sure to describe what this means in terms of the specific variables in the model. “For people of the same age, …”

Interpreting interaction term B weights: see “interactions” below.

Beta weights:

If significant, this means that a predictor uniquely contributes to predicting the DV beyond all other predictors in that model (Note: when looking at the coefficients table, the B and Beta weights are interpreted in the context of all other variables in that model, regardless of whether they are entered before or after the given predictor. Order matters in the model summaries only when looking at changes in R2.
Determining importance: to determine which variable in a set is most important (i.e., the strongest predictor) consider simple correlations and beta weights. Because betas are standardized, they can be meaningfully compared to one another. Warning: if predictor variables are highly correlated with each other (multicollinearity) the beta weights may be very unstable and apparent differences may not be statistically significant. Also, with high collinearity the unique portion of a variable may be hard to interpret.

It is useful to compare beta for a predictor to the correlation of that predictor with Y. The correlation is an indication of the relationship between the predictor and Y ignoring all other predictors, while the beta coefficient is an indication of the unique relationship between the predictor and Y controlling for all other variables in the model.
What if the overall model is significant, but individual betas are not? See multicollinearity below.
Interpret beta weights of main effects only in models NOT containing interaction terms.
Interpreting interaction term beta weights: see “interactions” below.

What is the relationship between significance test of B, Beta, & R2-Added?

When do these give an identical test of significance? If variable X is added to a model by itself, then the R2 added is a measure of the unique contribution of that variable beyond all of the other variables in the model. Tests of the B and beta in that model also test the unique contribution of each variable beyond all of the other variables in the model.

This means that the observed p-value for all three of these will be the same. Note: variable X (with one degree of freedom) must be added by itself for this to be true.

What is multicollinearity?

Multicollinearity for a variable refers to the extent of overlap between that variable and the other predictor variables. Note that multicollinearity does not include the criterion variable at all. However, if two (or more) variables have high multicollinearity, it is difficult to assess the unique effect of each independent variable on the criterion variable.

How to determine if your variables are multicollinear?

Large correlations between predictors
Unstable B weights: large changes in your regression coefficients when a predictor variable is added to or taken out of the model
Insignificant regression coefficients, but a significant F-value for your overall model
You can assess multicollinearity by examining “tolerance” and the Variance Inflation Factor (VIF) in PASW.

What is tolerance?

tolerance = 1- Ri² (where Ri² is the square of the multiple correlation predicting Xiusing all the other predictor variables in the equation. Ri² is a measure of collinearity)

A low tolerance value for a predictor variable is accompanied by a large standard error, and perhaps non-significance because multicollinearity may be an issue.

What is dummy coding and why is it used?

New variables are created to indicate group membership for categorical variables. For instance, to indicate sex, we would put a “1” to represent females, and a “0” to represent everyone else (i.e., males) in a given regression model. If we have three groups, we need two dummy variables to represent group membership. In general, you need to create (# of levels in categorical variable – 1) dummy variables to fully represent the original categorical variable. When the dummy variables are entered into a regression model as a set, the test of R² change is a test of the categorical variable.

How do you deal with missing data?

Listwise Solution: PASW throws out every case that is missing data on even one of the variables used in the regression model. This is the default in PASW.
This can be problematic because a lot of data can be lost.
Pairwise Deletion: calculate all pairwise correlations
this can be problematic because each correlation may be based on a different set of cases
Replace missing values with means
This can be problematic if there is a systematic reason for your missing values. Imputing means may skew your results; it also reduces variance.
Hot-deck: in a survey, take the value from the participant before, which preserves variability and captures some neighborhood shared variability
Multiple Regression: estimate the missing value by using multiple regression with other variables where you have data
Multiple Imputation: uses a resampling approach to create multiple estimates of values for missing data; this is state-of-the art, but not provided in the PASW GradPack (yet).
M Technique: create a “missingness” dummy variable to pair with a target variable that has missing data. Code the missingness variable as 0 for cases where a valid score is present in the target variable, and code the missingness variable as 1 for cases where the target variable has missing data. Then code missing data as a constant in the target variable (the mean is a good choice) and don’t tell PASWthat this is a ‘missingness’ code (then PASW will treat it as a real value). This ‘plugged variable’ can be used in regression on a step following the missingnessdummy variable. You can check to see if the missingness variable is correlated with any variables. This will help you characterize who is missing data.

What is linearity?

Linearity is an assumption of regression – that your predictor variable(s) and your criterion variable are linearly related.

How do you check for violation of linearity?

On your regression output, look at the plot of residuals versus predicted values; the points should be symmetrically distributed around a horizontal line. Look out for a “bowed” pattern, which indicates that the model makes systematic errors when it is making unusually large or unusually small predictions, look for outliers that may be unduly influential, and look for heteroscedasticity where variance around the line is greater in some regions than in other regions of the predicted value.

What is non-orthogonality?

Non-orthogonality refers to the correlation between predictor variables. To the extent that your predictor variables are unique and independent of each other, they can be considered orthogonal. It is desirable that predictors be orthogonal or at least relatively uncorrelated, to avoid unstable regression coefficients (you are trying to determine what each predictor uniquely contributes to the model).

What is the standard error of estimate?

This is a measure of the accuracy of predictions made with a regression line. It indicates the standard deviation of the errors around your regression line.

What are interactions?

There is an interaction between X1 and X2 in predicting Y if the relationship between X1and Y depends of levels of X2.

How do you interpret interactions?

An interaction is ‘differences of differences’. That is, the difference in the predicted value of Y for low and high values of X1 is different when X2 is low compared to when X2 is high.

Interpretation of interaction B and beta weights: A direct interpretation is possible, but tricky. The best approach is to graph the interaction and interpret it from this.

If X1 is a continuous predictor and X2 is a categorical predictor, you can compare the correlation between X1 and Y for the different levels of X2.
If X1 and X2 are both continuous variables, you can choose a high and low value of X2 that is meaningful and compare the correlations between X1 and Y for these two levels of X2 .
If X1 and X2 are both categorical predictors, you can conduct ‘simple effects’ to compute predicted Y values associated with different levels of X1 and compare these predictions at the different levels of X2.

Page 1