STEPWISE REGRESSION

Begin by performing a normal multiple regression. If all variables are shown as significant (P-values < ), then STOP -- the complete model is good.

But if Significance F is low, but one or more of the p-values for the t-tests are high, forward stepwise regression can be used to develop the best model that contains some of the variables as follows.

STEP 1.Do simple linear regressions of y vs. each x variable individually. Select the x variable with the lowest p-value. (Suppose it is X3.)

Step 2: Do all possible 2-variable regressions in which one of the two variables is X3.

  • If none of the 2-variable regressions gives low p-values for both X3 and the other variable -- STOP -- use the model utilizing only X3.
  • If one or more of the 2-variable models gives low p-values for both X3 and the second variable, select the model with the lowest p-values. (Suppose it is the one with X3 and X5.) --- GO TO STEP 3.

Step 3:Do all possible 3-variable regressions in which two of the three variables are X3 and X5.

  • If none of the 3-variable regressions gives low p-values for each of X3, X5, and the other variable -- STOP -- use the model utilizing only X3 and X5.
  • If one or more of the 3-variable models gives low p-values for X3, X5 and the third variable, select the model with the lowest p-values.

GO TO STEP 4 and continue this process.

Example

Here is the printout from a model of Y vs. X1, X2, X3, X4, and X5.

There is low Significance F, but 2 of the p-values are high.

ANOVA
df / SS / MS / F / Significance F
Regression / 5 / 82624266 / 16524853 / 18.79356 / 9.16E-06
Residual / 14 / 12309961 / 879282.9
Total / 19 / 94934227
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95%
Intercept / -1350.67 / 1326.782 / -1.01801 / 0.325946 / -4196.34 / 1494.996
X1 / 105.1368 / 37.21172 / 2.825368 / 0.013489 / 25.32554 / 184.9481
X2 / -905.579 / 688.1833 / -1.3159 / 0.209349 / -2381.59 / 570.4283
X3 / 4.038254 / 33.28221 / 0.121334 / 0.905151 / -67.3451 / 75.42157
X4 / 732.1831 / 257.4505 / 2.843976 / 0.013003 / 180.0062 / 1284.36
X5 / 23.08303 / 10.08736 / 2.288312 / 0.038187 / 1.447773 / 44.71829

Step 1: Do 5 1-variable regressions

X1:

Coefficients / Standard Error / t Stat / P-value
Intercept / 705.574 / 1093.339 / 0.645339 / 0.526849
X1 / 162.3509 / 49.62806 / 3.271353 / 0.004241

X2:

/ Coefficients / Standard Error / t Stat / P-value
Intercept / 5510 / 455.4713 / 12.09736 / 4.43E-10
X2 / -3298.56 / 678.9765 / -4.85813 / 0.000126

X3:

Coefficients / Standard Error / t Stat / P-value
Intercept / 1829.596 / 943.2457 / 1.939681 / 0.068254
X3 / 130.3296 / 49.62046 / 2.62653 / 0.017116

X4:

Coefficients / Standard Error / t Stat / P-value
Intercept / 33.24607 / 852.302 / 0.039007 / 0.969314
X4 / 1209.819 / 238.2256 / 5.07846 / 7.84E-05

X5:

Coefficients / Standard Error / t Stat / P-value
Intercept / 1921.712 / 1099.356 / 1.748034 / 0.097494
X5 / 42.24776 / 20.0507 / 2.107047 / 0.049403

Step 2: 2-variable regressions with X4

X4 and X1:

Coefficients / Standard Error / t Stat / P-value
Intercept / -2083.08 / 764.2981 / -2.72548 / 0.014388
X4 / 1062.177 / 170.179 / 6.241527 / 8.94E-06
X1 / 127.3128 / 28.7017 / 4.435724 / 0.000362

X4 and X2:

Coefficients / Standard Error / t Stat / P-value
Intercept / 2381.845 / 1156.512 / 2.059508 / 0.0551
X4 / 764.6601 / 266.6007 / 2.868185 / 0.010657
X2 / -1954.61 / 740.6114 / -2.63918 / 0.017223

X4 and X3:

Coefficients / Standard Error / t Stat / P-value
Intercept / -271.984 / 890.1006 / -0.30556 / 0.763646
X4 / 1059.013 / 272.7572 / 3.882622 / 0.001196
X3 / 47.64925 / 42.83959 / 1.112271 / 0.281504

X4 and X5:

Coefficients / Standard Error / t Stat / P-value
Intercept / -529.912 / 957.4169 / -0.55348 / 0.587141
X4 / 1099.614 / 251.4775 / 4.372614 / 0.000415
X5 / 18.61115 / 15.15154 / 1.228334 / 0.236057

Do 3-variable regressions with X1 and X4.

Step 3: 3-variable regressions with X1 and X4

X1, X4, and X2

Coefficients / Standard Error / t Stat / P-value
Intercept / -915.611 / 1400.646 / -0.65371 / 0.522586
X1 / 108.5795 / 34.33533 / 3.162327 / 0.006037
X4 / 921.6408 / 221.2157 / 4.166254 / 0.000728
X2 / -712.454 / 716.1868 / -0.99479 / 0.334647

X1, X4, and X3

Coefficients / Standard Error / t Stat / P-value
Intercept / -2105.84 / 780.2997 / -2.69876 / 0.015812
X1 / 136.6601 / 33.26264 / 4.108516 / 0.000822
X4 / 1116.86 / 196.6308 / 5.679982 / 3.41E-05
X3 / -20.7029 / 35.00935 / -0.59135 / 0.562546

X1, X4, and X5

Coefficients / Standard Error / t Stat / P-value
Intercept / -2782.66 / 761.0356 / -3.65641 / 0.00213
X1 / 130.5134 / 25.98578 / 5.022496 / 0.000125
X4 / 931.9743 / 164.9015 / 5.651702 / 3.61E-05
X5 / 21.36134 / 9.745077 / 2.192014 / 0.043515

Do 4-variable models that include X1, X4, and X5.

X1, X4, X5, and X2:

Coefficients / Standard Error / t Stat / P-value
Intercept / -1388.72 / 1246.139 / -1.11441 / 0.28264
X1 / 107.5962 / 30.16421 / 3.567017 / 0.002809
X4 / 749.4844 / 207.1954 / 3.617283 / 0.002534
X5 / 22.82502 / 9.531333 / 2.394735 / 0.030133
X2 / -879.915 / 632.9993 / -1.39007 / 0.184792

X1, X4, X5, and X3:

Coefficients / Standard Error / t Stat / P-value
Intercept / -2776.57 / 784.0729 / -3.54121 / 0.002962
X1 / 134.6924 / 30.38378 / 4.433037 / 0.000484
X4 / 959.9247 / 195.1911 / 4.91787 / 0.000186
X5 / 20.85893 / 10.18438 / 2.048129 / 0.058472
X3 / -9.42256 / 32.43436 / -0.29051 / 0.775403