Homework 9 Key
(100 points total + 25 points extra credit)
50 points for SPSS commands and 50 points for output.
DATA list
FILE = 'd:\510\2007\data\AFIFI.DAT' RECORDS=2
/ IDNUM 1-4 AGE 5-8 HEIGHT 9-12 SEX 13-15 SURVIVE 16 SHOKTYPE 17-20 SBP1 21-24
MAP1 25-28 HRT1 29-32 DBP1 33-36 CVP1 37-40 (1) BSA1 41-44 (2) CI1 45-48 (2)
APP1 49-52 (1) CIRC1 53-56 (1) UR1 57-60 PLAS1 61-64 (1)
RC1 65-68 (1) HGB1 69-72 (1) HCT1 73-76 (1) TIME1 80
/SBP2 21-24 MAP2 25-28 HRT2 29-32 DBP2 33-36 CVP2 37-40 (1) BSA2 41-44 (2)
CI2 45-48 (2) APP2 49-52 (1) CIRC2 53-56 (1)
UR2 57-60 PLAS2 61-64 (1) RC2 65-68 (1) HGB2 69-72 (1) HCT2 73-76 (1) TIME2 80.
EXECUTE.
SHOKTYPE, SEX, and SURVIVE
RECODE SHOKTYPE
(2=0) (3 thru 7=1) INTO SHOCK.
EXECUTE.
Value labels shoktype (2)2: Non-Shock (3)3: Hypovolemic
(4) 4: Cardiogenic (5)5: bacterial (6) 6: neurogenic (7)7: other.
Value labels survive (1)1: lived (3)3: died.
Value labels sex (1) 1: Male (2) 2: Female.
RECODE SURVIVE
(3=1) (1=0) INTO DIED.
EXECUTE.
Compute shock_dum2 = (shoktype=2).
Compute shock_dum3 = (shoktype=3).
Compute shock_dum4 = (shoktype=4).
Compute shock_dum5 = (shoktype=5).
Compute shock_dum6 = (shoktype=6).
Compute shock_dum7 = (shoktype=7).
Execute.
*------Alternative Coding for Shock dummy variables, using If, although not necessary.
do if not missing (shoktype).
Compute shockdum2 = (shoktype=2).
Compute shockdum3 = (shoktype=3).
Compute shockdum4 = (shoktype=4).
Compute shockdum5 = (shoktype=5).
Compute shockdum6 = (shoktype=6).
Compute shockdum7 = (shoktype=7).
end if.
execute.
Compute sbpdiff = sbp2 - sbp1.
execute.
/*Question 2: Descriptive Statistics*/
DESCRIPTIVES
VARIABLES=IDNUM AGE HEIGHT SEX SURVIVE SHOKTYPE SBP1 MAP1 HRT1 DBP1 CVP1
BSA1 CI1 APP1 CIRC1 UR1 PLAS1 RC1 HGB1 HCT1 TIME1 SBP2 MAP2 HRT2 DBP2 CVP2
BSA2 CI2 APP2 CIRC2 UR2 PLAS2 RC2 HGB2 HCT2 TIME2 SHOCK DIED shock_dum2
shock_dum3 shock_dum4 shock_dum5 shock_dum6 shock_dum7 sbpdiff
/STATISTICS=MEAN STDDEV MIN MAX .
/*Question 3: Frequencies*/
FREQUENCIES
VARIABLES=SHOKTYPE SHOCK SURVIVE DIED SEX
/ORDER= ANALYSIS .
/*Question 4: Histograms of SBP1 for those who lived and those who died*/
GRAPH
/HISTOGRAM=SBP1
/PANEL ROWVAR=SURVIVE ROWOP=CROSS .
/*Question 5: Independent samples t-test on SBP1 and SBPDIFF */
T-TEST
GROUPS = SURVIVE(1 3)
/MISSING = ANALYSIS
/VARIABLES = SBP1 sbpdiff
/CRITERIA = CI(.95) .
/*Question 6: Paired t-test of SBP1 vs SBP2 */
T-TEST
PAIRS = SBP1 WITH sbp2 (PAIRED)
/CRITERIA = CI(.95)
/MISSING = ANALYSIS.
SORT CASES BY died .
SPLIT FILE
SEPARATE BY died .
T-TEST
PAIRS = SBP1 WITH sbp2 (PAIRED)
/CRITERIA = CI(.95)
/MISSING = ANALYSIS.
SPLIT FILE
OFF.
/*Question 7: Scatterplot of SBP2 vs. SBP1*/
GRAPH
/SCATTERPLOT(BIVAR)=SBP1 WITH sbp2 BY died
/MISSING=LISTWISE .
/*Question 8: Linear Regression of SBP2 on SBP1*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT sbp2
/METHOD=ENTER SBP1
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
/*Question 9: Boxplots of SBP2 by levels of SHOKTYPE*/
EXAMINE
VARIABLES=SBP2 BY SHOKTYPE /PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL.
/*Question 10: Regression of SBP2 on SHOKTYPE dummy variables*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER shock_dum3 shock_dum4 shock_dum5 shock_dum6 shock_dum7
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
/*Question 11: Correlation matrix and scatterplot matrix*/
CORRELATIONS
/VARIABLES=SBP2 SBP1 BSA1 CI1 HGB1 MAP1
/PRINT=TWOTAIL NOSIG
/MISSING=LISTWISE .
GRAPH
/SCATTERPLOT(MATRIX)=SBP2 SBP1 BSA1 CI1 HGB1 MAP1
/MISSING=LISTWISE .
/*Question 12: Multiple Linear Regression*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER SBP1 BSA1 CI1 HGB1 MAP1 .
/* Rerun the regression without Map1 as a predictor*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER SBP1 BSA1 CI1 HGB1
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
************************************************************************************************************;
/*Question 2: Descriptive Statistics*/
DESCRIPTIVES
VARIABLES=IDNUM AGE HEIGHT SEX SURVIVE SHOKTYPE SBP1 MAP1 HRT1 DBP1 CVP1
BSA1 CI1 APP1 CIRC1 UR1 PLAS1 RC1 HGB1 HCT1 TIME1 SBP2 MAP2 HRT2 DBP2 CVP2
BSA2 CI2 APP2 CIRC2 UR2 PLAS2 RC2 HGB2 HCT2 TIME2 SHOCK DIED shock_dum2
shock_dum3 shock_dum4 shock_dum5 shock_dum6 shock_dum7 sbpdiff
/STATISTICS=MEAN STDDEV MIN MAX .
There are 104 cases with all variables complete in this data set. Values appear to be reasonable for all variables.
/*Question 3: Frequencies*/
FREQUENCIES
VARIABLES=SHOKTYPE SHOCK SURVIVE DIED SEX
/ORDER= ANALYSIS .
Frequency Table
These values look good. There’s not much to discuss here.
/*Question 4: Histograms of SBP1 for those who lived and those who died*/
GRAPH
/HISTOGRAM=SBP1
/PANEL ROWVAR=SURVIVE ROWOP=CROSS .
The histogram of SBP1 shows that SBP1 is higher in general for those who lived than for those who died. There is a fair amount of overlap in these values for the two groups of patients, and the amount of variability appears to be similar.
/*Question 5: Independent samples t-test on SBP1 and SBPDIFF */
T-TEST
GROUPS = SURVIVE(1 3)
/MISSING = ANALYSIS
/VARIABLES = SBP1 sbpdiff
/CRITERIA = CI(.95) .
There is a significant difference in the mean of both SBP1 and SBPDIFF for those who lived vs. those who died. The mean of SBP1 is higher for those who lived than those who died (t109df = 3.871, p < .001). The patients who lived had an average increase of SBP from time 1 to time 2 of 16.31 units, and those who died had an average decrease in SBP from time 1 to time 2 of 14.33 units (t 109df = 4.838, p< .001).
T-Test
/*Question 6: Paired t-test of SBP1 vs SBP2 */
T-TEST
PAIRS = SBP1 WITH sbp2 (PAIRED)
/CRITERIA = CI(.95)
/MISSING = ANALYSIS.
For all patients taken together, there is no significant difference in the mean of SBP1 vs. SBP2 (t110 df = -1.312, p=0.192).
T-Test
/*Question 6: Redo the paired t-test for those who died and those who lived*/
SORT CASES BY died .
SPLIT FILE
SEPARATE BY died .
T-TEST
PAIRS = SBP1 WITH sbp2 (PAIRED)
/CRITERIA = CI(.95)
/MISSING = ANALYSIS.
SPLIT FILE
OFF.
Note: for those who lived, there was a significant increase in SBP from time 1 to time 2 (t67df = -4.33, p < .001), but for those who died, there was a significant decrease in SBP from time 1 to time 2 (t42df = 2.633, p = .012). This explains why there was no significant difference when we looked at the difference across all patients, because some were increasing and others were decreasing.
T-Test
DIED = 0
DIED = 1
/*Question 7: Scatterplot of SBP2 vs. SBP1*/
GRAPH
/SCATTERPLOT(BIVAR)=SBP1 WITH sbp2 BY died
/MISSING=LISTWISE .
Note: I included markers by Died, but this is not necessary for this problem. The overall regression line shows that there is a linear relationship between SBP at time 1 and SBP at time 2.
- Carry out a simple linear regression, with SBP2 as the dependent variable, and SBP1 as the only predictor.
a)Get a plot of residuals vs. predicted values to check homogeneity of variance.
b)Get a histogram and normal P-P plot to check the normality of the residuals.
c)Save the Unstandardized Predicted values, Unstandardized Residuals, and Studentized Deleted Residuals as new variables (using the Save… button at the bottom of the Regression Window).
d)Include your regression output and the diagnostic plots in your homework.
The linear regression shows that there is a significant linear relationship between SBP1 and SBP2, estimated coefficient is .546, so we estimate that the mean of SBP2 is .546 units higher for a patient with one-unit higher SBP1 (t 109df = 5.348, p< .001).
/*Question 8: Linear Regression of SBP2 on SBP1*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT sbp2
/METHOD=ENTER SBP1
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
Regression
The histogram and normal p-p plot show that the residuals are reasonably normally distributed.
The plot of residuals vs. predicted values below shows that the assumption of equality of variances is reasonably met for this analysis.
/*Question 9: Boxplots of SBP2 by levels of SHOKTYPE*/
EXAMINE
VARIABLES=SBP2 BY SHOKTYPE /PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL.
The boxplot below shows that SBP2 in general is lower in each of the categories that are for patients in shock, than for the non-shock patients.
10. Carry out a regression with the dummy variables for SHOCKTYPE as the predictors and SBP2 as the dependent variable.
- Use Non-shock as the reference category.
- Get a plot of residuals vs. predicted values to check homogeneity of variance.
- Get a histogram and normal P-P plot to check the normality of the residuals.
- Include your regression output and the diagnostic plots in your homework.
/*Question 10: Regression of SBP2 on SHOKTYPE dummy variables*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER shock_dum3 shock_dum4 shock_dum5 shock_dum6 shock_dum7
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
The overall regression model is highly significant (F 5, 107 df = 3.5, p = .006). There is a significant difference in the mean of SBP2 for the Shoktype groups = 3, 4, 6, and 7, compared to the non-shock patients. Each of these shock types is predicted to have lower mean SBP2 than the non-shock group. The difference between shocktype=5 and non-shock is not significant (t 107 df = -1.760, p=.081).
The residuals appear to be reasonably normally distributed, although there is some evidence of a slight positive skewness in the residuals. The plot of residuals vs. predicted values shows that the assumption of homoskedasticity of residuals is reasonable for this model.
- Create a Pearson correlation matrix with the variables SBP2, SBP1, BSA1, CARDIAC1, HGB1, and MAP1.
e)Use listwise deletion for the variables.
f)Create a scatterplot matrix for these variables.
g)Include the correlation matrix and the scatterplot matrix in your homework output.
/*Question 11: Correlation matrix and scatterplot matrix*/
CORRELATIONS
/VARIABLES=SBP2 SBP1 BSA1 CI1 HGB1 MAP1
/PRINT=TWOTAIL NOSIG
/MISSING=LISTWISE .
GRAPH
/SCATTERPLOT(MATRIX)=SBP2 SBP1 BSA1 CI1 HGB1 MAP1
/MISSING=LISTWISE .
- Carry out a multiple regression with SBP2 as the dependent variable and the predictor variables, SBP1, BSA1, CARDIAC1, HGB1, and MAP1 as predictors.
- Check the collinearity diagnostics for this model.
- Include the model output, along with the collinearity diagnostics in your homework.
- Rerun the model, but remove MAP1 as a predictor.
- Check collinearity for this new model.
- Include both of these linear regression outputs in your homework. Include diagnostic plots only for the second model in your homework.
/*Question 12: Multiple Linear Regression*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER SBP1 BSA1 CI1 HGB1 MAP1 .
Based on the results of this linear regression model we can see that SBP1 and MAP1 are collinear (the condition index is > 30 (32.040) and the proportion of variance for these two variables on the last row of the collinearity diagnostic table are both > .5 (proportion of variance = .92 for both SBP1 and MAP1 on the last eigenvector). We also see that the VIF for both Sbp1 and Map1 are about 7.5, which is another indicator of collinearity.
Regression
/* Rerun the regression without Map1 as a predictor*/
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT SBP2
/METHOD=ENTER SBP1 BSA1 CI1 HGB1
/SCATTERPLOT=(*SDRESID ,*ZPRED )
/RESIDUALS HIST(ZRESID) NORM(ZRESID)
/SAVE PRED RESID SDRESID .
Regression
Collinearity is not a problem in this model. The condition index is 27.678, which is less than 30.
The only significant variable is SBP1 (t 102 df = 4.598, p< .001). The residuals appear to be somewhat negatively skewed, but this is not bad. The p-p plot shows that the residuals are reasonably normally distributed. The plot of residuals vs. predicted values are reasonable.
1