Rohini Vij

ID 3980158

STAT262 Problem Set 3 5/15/02

Stat262 PS3 solutions (adapted from Rohini Vij and Liang Qiao)

General comment: there seems to be a bug in R which causes it to ignore second order terms. Thus, some of you inputted the correct thing but got wrong answers. This is a useful lesson in the unreliability of software and the superiority of the human mind, which is responsible for examining software output and determining that it worked properly...

  1. Regression of satisfaction, sat, on all three predictor variables: age, anxiety and severity:

sat.hat = 162.876 – 1.21 (age) – 0.666 (severity) – 8.613 (anxiety)

Regression of sat on one predictor:

sat.hat = 121.83 – 1.527 (age)

sat.hat = 173.6 – 2.21 (severity)

sat.hat = 137.4 – 33.143 (anxiety)

Table of p-values from the regressions:

p-val 1 predictor / p-val 3 predictors
Age / 0.000015 / 0.0007
Severity / 0.00321 / 0.43
Anxiety / 0.00236 / 0.49

Age is significant at the 0.05 level both times. Severity and illness are each significant at the 0.05 level in the 1-predictor regressions, but not in the regression including all 3 predictors. None of the variables are insignificant both times.

The correct interpretation of these results is:

Each variable is significantly correlated with satisfaction in itself. However in the model with all 3 predictors, the effect of each of Severity and Anxiety can be accounted for by the other two predictors combined.

To make statements about the specific form of the inter-relationships between the predictors, we need more information (in particular to look at regression with pairs of predictors).

  1. Model based on regression of creatinine clearance on age, weight, serum creatinine:

clearance.hat = 120.05 – 0.737(age) + 0.776(weight) – 39.94(creatinine)

R-Squared: 0.8548Adjusted R-squared: 0.8398

Regression of creatinine clearance on serum creatinine:

clearance.hat = 154.66 – 55.56(creatinine)

R-Squared: 0.6429Adjusted R-squared: 0.6314

Full quadratic model:

clearance.hat = 68.4 – 1.22(age) +3.31(weight) – 93.9(creatinine) + 0.0088(age2) – 0.012(weight2) + 8.645(creatinine2) - 0.012(age*weight) + 0.423(age*creatinine) – 0.031(weight*creatinine)

R-Squared: 0.894Adjusted R-squared: 0.8526

We can see that:

-R2 is monotone in model complexity

-Adjusted is always smaller

-In this case, the adjusted is also monotone, which implies that the more complex models are indeed more adequate for this data

  1. 5 cases are missing data. The data for age is missing for cases 25, 37, 272, 437. Data for SBP and DBP are missing for case 284.
  1. Model for SBP allowing for different intercepts for men and women, and for different slopes for men and women w.r.t. age:

SBP.hat = 99.15 + 0.582(age) + 15.1175(male) – 4.353(src) – 0.21(age*male)

Where male is a dummy variable with value 0 for females, and 1 for males.

For the sample from electoral rolls (src=0):

The estimate of intercept for men is (99.15+15.1175) = 114.27

The estimate of intercept for women is 99.15

For the sample from working subjects:

The estimate of intercept for men is (99.15+15.1-4.4) = 109.9

The estimate of intercept for women is 94.8

The estimate of slope for age for men = 0.582-0.21 = 0.372

The estimate of slope for age for women = 0.582

The coefficient estimates, t-statistics and p-values are:

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 99.15044 5.33157 18.597 < 2e-16 ***

age 0.58219 0.09655 6.030 3.23e-09 ***

male 15.11746 4.72982 3.196 0.00148 **

src -4.35263 2.36682 -1.839 0.06652 .

age:male -0.21040 0.10481 -2.007 0.04525 *

Thus, the coefficients for age and male are statistically significant at all conventional testing values. The coefficient for age*male is statistically significant at the (two-tailed) 5% significance level, but not at the 1% level. The coefficient for src is not statistically significant at the 5% (two-tailed) level.

The R-squared for the model is 0.2064.

Residual MSE is 15.71

  1. Mean SBP for sample from electoral roles (src=0):

AGESBP_Men SBP_WOMEN

25 123.56 113.71

50 132.86 128.26

75 142.15 142.82

Mean SBP for sample from working subjects (src=1)

AGESBP_Men SBP_WOMEN

25 119.2 109.3

50 128.5 123.9

75 137.8 138.5

The yearly increase in mean SBP for men is 0.372; the yearly increase for women is 0.582.

6. Model for SBP for men with age and src as predictors:

SBP.hat = 110.1 + 0.413 (age) – 1.5385(src)

(s.e) (0.057) (2.93)

R-Squared: 0.1274, Adjusted R-squared: 0.1225

Residual MSE: 15.67

So we see that for men only, age seems to be the useful predictor, while src seems to have no effect in addition to that of age.

The overall R2 is lower than that for the combined model, and the residual variance is essentially the same, indicating that this model explains essentially the same amount of “male” variability as the model of problem 4.

Model for SBP for women with age and src as predictors:

SBP.hat = 107.26 + 0.48 (age) – 9.609 (src)

(s.e) (0.12) (4.0)

R-Squared: 0.3386, Adjusted R-squared: 0.3287

Residual MSE: 15.70

We see that for women, src is a significant predictor and carries useful information about the SBP. This regression manages to explain more of the variance in women – about 34%.

However notice that the residual MSE is essentially the same for all 3 models:

-combined model: 15.71

-men model: 15.67

-women model: 15.70

Which implies that allowing a different src effect for each sex (the only difference between the model in problem 4 and the two models in problem 6) contributes almost nothing to our ability to predict SBP. We could test this hypothesis directly by adding a sex*src interaction to the model of problem 4, and testing for significance of this coefficent.

7. Model for DBP for men with age, age2, and src as predictors:

DBP.hat = 44.58 + 1.159(age) – 0.011 (age2) + 6.406(src)

(s.e.) (0.22) (0.002) (2.18)

R-Squared: 0.122, Adjusted R-squared: 0.1146

Residual s.e.: 10.57 on 355 df

The coefficients for age and for age2 are individually and jointly statistically significant (t-statistic for coeff. of age, 5.2, t-stat. for coeff. of age^2, -4.3, F-statistic for Ho: coeff. of age =0, age^2 = 0 is 19.16) suggesting that there is an association between age and DBP for men.

Model for DBP for women with age, age2 and src as predictors:

DBP.hat = 60.19 + 0.346(age) -0.0023(age2) + 2.407(fsrc)

(s.e) (0.3) (0.003) (2.67)

R-Squared: 0.029Adjusted R-squared: 0.0066

Residual standard error: 10.33 on 132 df

F statistic p-val for the hypothesis that all coefficients are 0: 0.28

So we can say that there is no evidence of a relationship between our predictors and DBP in females!

The mean squared errors of the residuals for the separate regression models is similar (10.57^2 vs. 10.33^2) suggesting that the variance of the residuals, as estimated by the MSE is similar for men and women (although for women our predictors seem useless!).

The R-squared is small for both models, so we would need many more predictors for reliable estimation of the DBP.

Including a quadratic in age indicates that there is diminishing effect of age on DBP as age increases. Differentiating both equations with respect to age:

For men: d(DBP)/d(age) = 1.159 – 2(0.011)(age)

For women: d(DBP)/d(age) = 0.346 – 2(0.0023)age

Thus, maximum mean DBP for men is estimated to be at age 52.7 (1.159/(2x0.011). The maximum mean DBP for women is estimated to at age 75.2 (0.346/(2x0.0023)), thus, DBP peaks at a later age for women than for men. The rate of change of DBP with age is greater for men for ages less than 46.7; for ages greater than for 46.7, rate of change of DBP with age is greater for women than for men.