Name:April 6, 2012

T.A. name/Class time:

MW Lecturer:

Lab 11: Chapter 11

The “lifesat.sav” on the course website contains simulated data for 30 adults 7 years after graduating from college (modified from problem created by David W. Stockburger, MissouriState). Included are the following variables:

Finish (0 = college program not finished, 1 = college program finished)

IncomeC = annual Income in College in thousands of dollars

HealthC = Score on a Health Inventory in College

LifeSatC = Score on a Life Satisfaction Inventory in College

LifeSat = Score on a Life Satisfaction Inventory seven years after College

SES = Socio-Economic Status of parents

Income = Income seven years after college in thousands of dollars

We are interested in determining whether income seven years after college can be predicted from the above variables. To do this, we will use data set to determine the best multiple regression model for predicting the Income, check all regression assumptions, and report the results. Use a 5% significance level for this entire problem set.

1.(1 points) Find the correlation of Income with the other 6 variables. Organize the variables (name, r) in the chart below from strongest to weakest correlation with Income. Put a star (*) above all the variables which have significant correlations with Income.

Variable
Correlation
strongest / weakest

2.(1 points) Use SPSS to generate a matrix of scatterplots of all combinations of variables. On the graph, label the scatterplot of the pair of variables with the most obvious linear relationship and the scatterplot that looks the least like a linear relationship.

Note: Although there are multiple ways to choose the best multiple regression model, we will start with the full model and delete variables.

3.(1 point) Perform a regression of income using the 6explanatory variables. Complete the first line of the table below:

Explanatory variables in model / R2 / Standard error / Any variables not significant? (list them and their P-values)
LifeSat,
Finish,
SES,
IncomeC,
HealthC,
LifeSatC

4.(2 points) Now drop the least significant variable and re-run the regression. Was dropping that variable a good change? Explain why or why notin the space below the chart.

Explanatory variables in model / R2 / Standard error / Any variables not significant? (list them and their P-values)

5. (2 points) Now drop the least significant variable and re-run the regression. Was dropping that variable a good change? Explain why or why not in the space below the chart.

Explanatory variables in model / R2 / Standard error / Any variables not significant? (list them and their P-values)

6.(2 points) Again, drop the least significant variable and re-run the regression. Was dropping that variable a good change? Explain why or why not in the space below the chart.

Explanatory variables in model / R2 / Standard error / Any variables not significant? (list them and their P-values)

7.(2 points) Should we drop any more variables from the model? Why or why not?

8.(2 points) State the regression equation for the final model from number 6 above.

9.(2 point) Using the equation for the best model, predict the income for student number 12. Show your work.

10.(2 point) What is the residual for the prediction in the above problem? Show your work.

11.(1 points) Show yourNormal probability plot of the residuals for the final model. Is the normality assumption met? How do you know?

12.(2 points) Create residual plots of the residuals against each explanatory variable remaining in the final model.

a) Is the assumption of constant variance met? How do you know?

b) Is the assumption of linearity met? How do you know?

SPSS Instructions for Lab 11

Students need to print out the following SPSS output to submit with the lab: matrix scatterplot of all the variables, correlation table, model summary for all 3 steps of regression, ANOVA table for all 3 steps of regression, coefficient table for all 3 steps of regression, Normal probability plot, and residual plots.

Download the dataset from course website and save it on your desktop and open it by SPSS.

(1)Correlation Matrix

Analyze  Correlate  Bivariate. Select all the variables (explanatory and response, note that “subject” is not a variable) and move them into Variables box. Then click OK.

(2)Scatter Plot

Graphs  Legacy Dialogs  Scatter/Dot  Matrix/Scatter  Define.

Move all the variables into the “matrix variable” box. Then click OK.

(3) LSR Equation and Normal Probability Plot

1. Analyze  Regression  Linear

2. Move “Income” to the Dependent box and all the explanatory variables you are using to the Independent(s).  OK.

3. Based on the P-values decide to either remove or leave the variable in the model, but only remove one variable at a time. Repeat numbers 1, 2 and 3 of this set of instructions, removing any variable from the list of explanatory variables that you wish to remove.

When you have decided on your final set of variables complete steps 4 - 6 below.

4. Analyze  Regression  Linear

5. Click “Plots” on the right. Check “Normal probability plot” box  Continue

6. Click “Save” button on the right. The “Linear Regression: Save” dialogue box will open. Check Unstandardized under “Residuals.”  Continue  OK

(4)Residual Plot with Reference Line: Go back to the SPSS “Data View,” you will find a column named “RES_1,” which are the saved residuals.

Graphs  Legacy Dialogs  Scatter/Dot  Simple Scatter Define 

Let “Unstandardized Residuals” be Y and one of your explanatory variables be X. Click  OK

Adding reference line: Double click on the scatterplot in the output, so the “Chart Editor” window is open.

Click on “Chart”  “Add Chart Element”  Y Axis Reference Line  Fill out “0” in the “Y Axis Position”  Then click “Apply” and close the “Chart Editor” window.

Repeat with each explanatory variable.

The conditions required for the validity of the inference procedures are:

  • The data are a simple random sample from the population. (Look at the story.)
  • The two variables are linearly related. (Look for a linear pattern in the scatter plot OR look for a random scattering of dots on the residual plots.)
  • For a given x-value, the distribution of the y-values in the population is Normal. (Look at the Normal Probability (P-P) plot. Points should follow the 45-degree angle line.)
  • The standard deviation of those Normal distributions are the same at all x-values. (Look at the residual plot. There should be no “funnel” shape to the points in the plot.)

1