BA 275Modeling Relationships

Winter 2007Multiple Regression Analysis

Multiple Regression with Two Independent Variables

A collector of antique grandfather clocks believes that the price received for the clocks at an antique auction increases with the age of the clocks and with the number of bidders. Thus, the model hypothesized is

where Y = Auction price, X1 = Age of clock (years), and X2 = Number of bidders.

1. Data Collection
Age / Bidder / Price
X1 / X2 / Y
127 / 13 / 1235
115 / 12 / 1080
127 / 7 / 846
150 / 9 / 1522
156 / 6 / 1047
182 / 11 / 1979
156 / 12 / 1822
132 / 10 / 1253
137 / 9 / 1297
113 / 9 / 946
137 / 15 / 1713
117 / 11 / 1024
137 / 8 / 1147
153 / 6 / 1092
117 / 13 / 1152
126 / 10 / 1336
170 / 14 / 2131
182 / 8 / 1550
162 / 11 / 1884
184 / 10 / 2041
143 / 6 / 854
159 / 9 / 1483
108 / 14 / 1055
175 / 8 / 1545
108 / 6 / 729
179 / 9 / 1792
111 / 15 / 1175
187 / 8 / 1593
111 / 7 / 785
115 / 7 / 744
194 / 5 / 1356
168 / 7 / 1262
2. Initial Analysis



3. Fitting the Model and 4. Assessing the Model



5. Model Diagnostics : Checking the Conditions





  • Is there any violation of the required conditions? (normality, independence, constant variance, and zero mean.)
  • When the number of bidders is around 10, does the current model tend to overestimate or underestimate Price? How about the number of bidders is around 5 or 15?

6. Model Selection
  • After trying out several models, there are only a few remaining models that passed all the tests and satisfied the required conditions. The following table summarizes the STATGRAPHICS PLUS outputs from each model. Which one of the competing models should be chosen as our final model? And why? (Assume that our current model passed all the tests and satisfied the required conditions.)

Candidate / SSResidual / / R-sq / R-sq(adj) / # of X’s
Current / 2
A / 492317 / 132.6 / 0.912 / 0.854 / 3
B / 489735 / 134.7 / 0.920 / 0.832 / 4
C / 354900 / 130.0 / 0.960 / 0.893 / 10
7. Using the (Final) Model
  • What is the total variation of auction prices? How much has been explained by the model?
  • If there are 10 bidders and the age of the clock is 100 years old, what is the expected auction price?
  • If Age is held fixed and the number of bidders increases from 10 to 11, how much does Price increase?

Multiple Regression with One Dummy Variable

Sometimes a predictor/independent variable (X) can take only two possible values; e.g. gender (male or female). Such qualitative variables are handled in a multiple regression analysis by use of 0-1 variables. This kind of qualitative variables are also referred to as “dummy” variables.

A bank would like to develop a model to predict the total sum of money that customers withdraw (Y) from Automatic Teller Machines (ATMs) on a weekend based on the median value of homes (X1) in the neighborhood in which the ATM is located and the location of the ATM (X2) (no = not a shopping center and yes = shopping center). A random sample of 15 ATM locations is selected. The multiple linear regression model:

Y =  + 1 X1 + 2 X2 + 

with normal error terms is expected to be appropriate. Perform a multiple linear regression analysis.

Regression Printout

Questions

  1. Write down the fitted model.
  1. Is the assumed model reliable? Why?
  1. What is the value of R2? the adjusted R2? To select a model, why do we prefer adj-R2 to R2?
  1. Predict the amount of money withdrawn for a neighborhood in which the median value of homes is $200,000 for an ATM that is located in a shopping center.
  1. If the median value of homes increases by $2,000, then the amount of money withdrawn from an ATM located in a shopping center is expected to increase by .
  1. If the median value of homes is $200,000, then the amount of money withdrawn from an ATM located in a shopping center is ; and the amount of money withdrawn from an ATM located outside a shopping center is . What is the difference?

1

Hsieh, P-H