Pollution and Mortality Stepwise Selection (Part I)

Output from Stepwise (menu version)

*** Stepwise Regression ***

*** Stepwise Model Comparisons ***

Start: AIC= 98206.93

MORTALITY ~ PRECIP + HUMIDITY + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY + NONWHITE + WHITECOL + POOR

Single term deletions

Model:

MORTALITY ~ PRECIP + HUMIDITY + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY +

NONWHITE + WHITECOL + POOR

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63229.12 98206.9

PRECIP 1 4884.88 68114.00 100401.2

HUMIDITY 1 14.46 63243.58 95530.8

JANTEMP 1 6316.09 69545.21 101832.4

JULYTEMP 1 3747.87 66976.99 99264.2

OVER65 1 415.17 63644.29 95931.5

HOUSE 1 2741.69 65970.81 98258.0

EDUC 1 4660.46 67889.57 100176.8

SOUND 1 57.89 63287.01 95574.2

DENSITY 1 2189.56 65418.68 97705.9

NONWHITE 1 33120.46 96349.58 128636.8

WHITECOL 1 79.36 63308.48 95595.7

POOR 1 63.04 63292.16 95579.4

Step: AIC= 95530.79

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY + NONWHITE + WHITECOL + POOR

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY + NONWHITE +

WHITECOL + POOR

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63243.58 95530.8

PRECIP 1 5040.61 68284.19 97880.8

JANTEMP 1 6616.78 69860.36 99457.0

JULYTEMP 1 4621.08 67864.66 97461.3

OVER65 1 406.09 63649.67 93246.3

HOUSE 1 2727.89 65971.48 95568.1

EDUC 1 4718.21 67961.79 97558.4

SOUND 1 51.42 63295.01 92891.6

DENSITY 1 2249.71 65493.29 95089.9

NONWHITE 1 33121.69 96365.27 125961.9

WHITECOL 1 77.44 63321.02 92917.6

POOR 1 57.75 63301.33 92897.9

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + SOUND + DENSITY + NONWHITE +

WHITECOL + POOR

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63243.58 95530.79

HUMIDITY 1 14.4627 63229.12 98206.93

Step: AIC= 92891.62

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL + POOR

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL +

POOR

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63295.01 92891.6

PRECIP 1 5155.58 68450.59 95356.6

JANTEMP 1 9717.81 73012.82 99918.8

JULYTEMP 1 4711.22 68006.23 94912.2

OVER65 1 435.81 63730.81 90636.8

HOUSE 1 2757.86 66052.87 92958.9

EDUC 1 5428.63 68723.63 95629.6

DENSITY 1 2209.05 65504.06 92410.1

NONWHITE 1 33200.71 96495.72 123401.7

WHITECOL 1 57.79 63352.80 90258.8

POOR 1 15.60 63310.60 90216.6

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL +

POOR

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63295.01 92891.61

HUMIDITY 1 7.99578 63287.01 95574.22

SOUND 1 51.42473 63243.58 95530.79

Step: AIC= 90216.61

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63310.6 90216.6

PRECIP 1 5333.89 68644.5 92859.9

JANTEMP 1 14436.46 77747.1 101962.5

JULYTEMP 1 5788.63 69099.2 93314.6

OVER65 1 571.09 63881.7 88097.1

HOUSE 1 2892.42 66203.0 90418.4

EDUC 1 5655.27 68965.9 93181.3

DENSITY 1 2661.80 65972.4 90187.8

NONWHITE 1 46479.10 109789.7 134005.1

WHITECOL 1 53.60 63364.2 87579.6

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE + WHITECOL

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63310.60 90216.61

HUMIDITY 1 7.54469 63303.06 92899.67

SOUND 1 9.27203 63301.33 92897.94

POOR 1 15.59837 63295.01 92891.61

Step: AIC= 87579.61

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63364.2 87579.6

PRECIP 1 5425.43 68789.6 90314.4

JANTEMP 1 14469.14 77833.3 99358.2

JULYTEMP 1 6095.32 69459.5 90984.3

OVER65 1 589.85 63954.1 85478.9

HOUSE 1 2839.41 66203.6 87728.4

EDUC 1 9826.79 73191.0 94715.8

DENSITY 1 2608.21 65972.4 87497.2

NONWHITE 1 47493.24 110857.4 132382.2

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + OVER65 + HOUSE + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63364.21 87579.61

HUMIDITY 1 7.40322 63356.80 90262.81

SOUND 1 5.37617 63358.83 90264.84

WHITECOL 1 53.60301 63310.60 90216.61

POOR 1 11.40700 63352.80 90258.81

Step: AIC= 85478.86

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + HOUSE + EDUC + DENSITY + NONWHITE

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + HOUSE + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63954.1 85478.9

PRECIP 1 5421.96 69376.0 88210.2

JANTEMP 1 14954.43 78908.5 97742.7

JULYTEMP 1 5601.36 69555.4 88389.6

HOUSE 1 2562.10 66516.2 85350.4

EDUC 1 9758.83 73712.9 92547.1

DENSITY 1 2983.22 66937.3 85771.5

NONWHITE 1 65702.71 129656.8 148491.0

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + HOUSE + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 63954.06 85478.86

HUMIDITY 1 0.0990 63953.96 88169.37

OVER65 1 589.8494 63364.21 87579.61

SOUND 1 3.5439 63950.51 88165.92

WHITECOL 1 72.3655 63881.69 88097.10

POOR 1 139.3969 63814.66 88030.07

Step: AIC= 85350.36

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + EDUC + DENSITY + NONWHITE

Single term deletions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 66516.2 85350.4

PRECIP 1 6681.27 73197.4 89341.0

JANTEMP 1 12537.37 79053.5 95197.1

JULYTEMP 1 6093.75 72609.9 88753.5

EDUC 1 7403.87 73920.0 90063.6

DENSITY 1 6540.34 73056.5 89200.1

NONWHITE 1 73230.76 139746.9 155890.5

Single term additions

Model:

MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + EDUC + DENSITY + NONWHITE

scale: 1345.3

Df Sum of Sq RSS Cp

<none> 66516.16 85350.36

HUMIDITY 1 0.298 66515.86 88040.67

OVER65 1 312.544 66203.61 87728.42

HOUSE 1 2562.100 63954.06 85478.86

SOUND 1 6.230 66509.93 88034.73

WHITECOL 1 19.630 66496.53 88021.33

POOR 1 30.794 66485.36 88010.17

*** Linear Model ***

Call: lm(formula = MORTALITY ~ PRECIP + JANTEMP + JULYTEMP + EDUC + DENSITY + NONWHITE, data = ex1217,

na.action = na.exclude)

Residuals:

Min 1Q Median 3Q Max

-80.68 -21.53 1.424 22.77 83.05

Coefficients:

Value Std. Error t value Pr(>|t|)

(Intercept) 1242.4388 123.2833 10.0779 0.0000

PRECIP 1.4013 0.6073 2.3073 0.0250

JANTEMP -1.6845 0.5330 -3.1607 0.0026

JULYTEMP -2.8403 1.2890 -2.2035 0.0319

EDUC -16.1578 6.6524 -2.4289 0.0186

DENSITY 0.0076 0.0033 2.2828 0.0265

NONWHITE 5.2753 0.6906 7.6387 0.0000

Residual standard error: 35.43 on 53 degrees of freedom

Multiple R-Squared: 0.7086

F-statistic: 21.48 on 6 and 53 degrees of freedom, the p-value is 1.305e-012

Alternative Search using the Command Line:

The function leaps(X, Y, method=”Cp”) finds the “best” subsets with a given number of predictors using a method of model selection criteria such as "Cp", "r2", and "adjr2".

The output is a list with four components giving information on the regression subsets. Components Cp (or adjr2 or r2), size and label will all have the same length -- one element per subset; this will be the number of rows in which.

Cp,adjr2,r2

the first returned component will be named Cp, adjr2, or r2 depending on the method used for evaluating the subsets. This component gives the values of the desired statistic. If r2 or adjr2 are used, the result is in percent.

size

the number of explanatory variables (including the constant term if int is TRUE) in each subset.

label

a vector of character strings, each element giving the names of the variables in the subset.

which

logical matrix with as many rows as there are returned subsets. Each row is a logical vector that can be used to select the columns of x in the subset.

int

logical value telling whether the which matrix contains, as its first column, the status of the intercept variable.

Ø  leaps.1217 <- leaps(ex1217[,3:14], ex1217[,2])

Ø  plot(leaps1217$size,leaps1217$Cp, ylab="Mallow's Cp", xlab="p (Number of Coefficients in the Model)")

Ø  title("Cp Plot for Mortality Data without Pollution Variables")

Ø  abline(0,1)

Ø  leaps1217$rank = rank(leaps1217$Cp) # gives the rank of each model

Ø  leaps1217$best = sort.list(leaps1217$Cp) # leaps1217$best[1] is the index of the best model (number 51)

Ø  leaps1217$label[51] # included variables:

Ø  "PRECIP,JANTEMP,JULYTEMP,EDUC,DENSITY,NONWHITE”

Ø 

Ø  Next Time BIC