Yale University
School of Management
EMBA-MGT 511 January 18, 2008
Hypothesis Testing and Regression
Due: Feb 1, 2008 before class
Email to: with copy to . Use the subject line MGT511-HW3.
Homework #3
Data for Questions 1-2 in this homework are in the file HW3.xls. The worksheets within HW3.xls have the data for the questions.
1. As part of a study designed to understand how a print publication’s reader composition affects the cost of placing an ad in this publication, the following data were collected for major magazines that were available on the news-stands during the second quarter of 1988. Rates (in $) and Circulation figures (in 000’s) were taken from Adweek’s Marketer’s Guide to Media (Second Quarter 1988). The rates considered in this study include only the one-time rate for a full page, 4 color insertion (in $). Data on readers’ median age and median household income for each publication were taken from the Simmons 1987 Study of Media and Markets.
The data are posted on the website under the worksheet Ad Rates. The variables include the magazines’ names, the cost for a one-time ad (COSTAD), the circulation (CIRC), median income (MEDINC) and median age (MEDAGE). Missing data are shown as “*”.
Before we do the analysis, we first eliminate all observations with missing data and transcription errors. While it is not clear, that this does not bias our sample, given our inability to correct the transcription errors or fill out the missing values, we assume that observations with transcription errors and missing data are truly random and therefore eliminating these observations will not bias the results. After we eliminate these observations, we are left with 87 observations for our analysis.
- Plot COSTAD against CIRC. Is there visual evidence of a relationship between these two variables? If so, can this relationship be explained causally?
Soln: The plot of COSTAD vs. CIRC is provided. The scatter diagram does indeed suggest a relationship between these two variables. Logic suggests that a higher circulation causes the cost of the insert to be higher, and not the other way around. All other things equal, a larger audience increases the attractiveness of a publication, and hence the demand for space increases which explains the higher cost.
- Take a natural log (LN in Excel) transform of COSTAD and natural log transform of CIRC. Plot the resulting transformed variables (Ln(COSTAD) vs. Ln(CIRC)). Would a simple linear regression model capture the relationship between the variables (Ln(COSTAD) as the criterion variable and Ln(CIRC) as the predictor variable) properly? Why or why not?
Soln: The plot of Ln(COSTAD) vs. Ln(CIRC) is provided. As the plot suggests, a simple linear regression model can be fit to the transformed data, since the data after the transformation appear to be far more linear than is true for the raw data in the graph of problem a.
- Run a multiple regression with Ln(COSTAD) as the criterion (dependent) variable and Ln(CIRC), MEDINC, and MEDAGE as the predictor (independent) variables. Interpret the slope coefficients for ln(Circ), MEDINC and MEDAGE precisely (in terms of how a change in the predictor variable affects the criterion variable).
Soln: The Excel printout for the multiple regression is provided below.
SUMMARY OUTPUTRegression Statistics
Multiple R / 0.92568
R Square / 0.856883
Adjusted R Square / 0.85171
Standard Error / 0.223981
Observations / 87
ANOVA
df / SS / MS / F / Significance F
Regression / 3 / 24.93056 / 8.310188 / 165.6484 / 6.22E-35
Residual / 83 / 4.163913 / 0.050168
Total / 86 / 29.09448
Coefficients / Standard Error / t Stat / P-value
Intercept / 4.855259 / 0.301328 / 16.11288 / 2.8E-27
LN(CIRC) / 0.652408 / 0.029606 / 22.03615 / 1.96E-36
MEDINC / 4.01E-05 / 4.77E-06 / 8.399147 / 1.04E-12
MEDAGE / -0.01166 / 0.004119 / -2.83172 / 0.005808
1. All else being held constant, a 1% increase in circulation will lead to a 0.65% increase in the cost of advertising.
This means a magazine with higher circulation commands a premium over a magazine with less circulation.
2. All else being held constant, a $1000 increase in the median income of the readership, increases the cost of advertising by 0.04%. (Easy to interpret)
All else being held constant, a $1 increase in the median income of the readership, increases the cost of advertising by 4.01e-5%. (Literally)
This means Publications with a more affluent readership command a premium over publications with a less affluent readership.
3. All else being held constant an increase of 1 year in the median age of the readership reduces the cost of advertising by 0.012%.
This means publications with a younger audience command a premium over publications with an older audience.
- In 1988, what would have been the expected cost for a one-page insert in a publication with a circulation of 1,200,000 and an audience whose median income was $40,000 and median age was 40 years?
Soln: The multiple regression model is:
Substituting, the values for CIRC (=1200), MEDINC (=40000), and MEDAGE (=40) into the above equation, we have:
2. BestBooks has noticed that sales of fiction books are higher during the holiday quarter (fourth quarter) than in other quarters. Periodically, BestBooks sends out $1 coupons electronically to customers. Management believes that the response to coupons is greater during the holiday (fourth) quarter than in other quarters. Data are available in the worksheet “BestBooks”.
Sales: Units sales of fiction books in a quarter
Price: Average price of fiction books during the quarter
Coupons: Number of Coupons sent out (in thousands)
Quarter: Coded as 1, 2,3,4 for each of the four quarters.
- Construct a dummy variable called “Holiday” that distinguishes only the fourth quarter from the other quarters. In other words, the other quarters are treated as if there is no difference between them. Run a regression of Sales against Price, Coupons and Holiday. Test the null hypothesis of zero explanatory power.
SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.998839
R Square / 0.99768
Adjusted R Square / 0.997487
Standard Error / 95.94016
Observations / 40
ANOVA
df / SS / MS / F / Significance F
Regression / 3 / 1.42E+08 / 47498266 / 5160.323 / 1.85E-47
Residual / 36 / 331362.5 / 9204.514
Total / 39 / 1.43E+08
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95%
Intercept / 19776.8 / 348.624 / 56.72817 / 8.18E-37 / 19069.76 / 20483.84
Price / -576.438 / 24.20615 / -23.8137 / 1.26E-23 / -625.53 / -527.346
Coupon(in thousands) / 207.4475 / 13.09299 / 15.84416 / 8.29E-18 / 180.8937 / 234.0013
Holiday / 4234.295 / 35.03834 / 120.8475 / 1.43E-48 / 4163.234 / 4305.356
Ho: the model has no true explanatory power
Ha: it does
The F-test has a p-value < .001. Reject Ho.
- In this model, what is the effect of Coupons in a Holiday quarter? What is it in any other quarter?
The slope for Coupon is statistically significant (p<.001). The estimated effect is that for each additional thousand coupons sales is expected to increase by 207 units, holding Price constant, in the holiday quarter and in any other quarter.
- Create a new regression model in which you can test the null hypothesis of no difference in coupon response during the holiday quarter relative to other quarters.
(Hint: Create an interaction variable Coupons*Holiday, and add this variable to the model; run a regression with this variable added to the variables used in (a)).
SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.999007
R Square / 0.998014
Adjusted R Square / 0.997787
Standard Error / 90.0194
Observations / 40
ANOVA
df / SS / MS / F / Significance F
Regression / 4 / 1.43E+08 / 35635635 / 4397.565 / 9.55E-47
Residual / 35 / 283622.2 / 8103.492
Total / 39 / 1.43E+08
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95%
Intercept / 20131.7 / 358.3018 / 56.18644 / 6.75E-36 / 19404.31 / 20859.09
Price / -586.032 / 23.05372 / -25.4203 / 3.77E-24 / -632.834 / -539.231
Coupon(in thousands) / 195.2522 / 13.27273 / 14.71078 / 1.49E-16 / 168.3071 / 222.1974
Holiday / 2365.002 / 770.8436 / 3.068069 / 0.004142 / 800.104 / 3929.899
Hol*Coup / 84.95625 / 35.00167 / 2.427205 / 0.020501 / 13.89899 / 156.0135
- In the new model, what is the marginal effect on sales if BestBooks sends out a thousand more coupons during a regular quarter?
The effect of Coupon is statistically significant (p<.001). The estimated effect is to increase sales by 195 units (for thousand more coupons in a regular quarter), holding Price constant.
- In the new model, what is the marginal effect on sales if BestBooks sends out a thousand more coupons during the holiday quarter?
The interaction effect is also statistically significant (p<.05). The estimated effect is to increase sales by 195+85=280 units (for thousand more coupons in a holiday quarter), holding Price constant.
3. The Career Development Office (CDO) decided that it should provide
guidance to SOM students about the starting salaries that students can expect upon graduation. Mark Case asked you to gather data on starting salaries (in $000’s excluding bonuses), the functional job activity and the student’s work experience prior to SOM. You select 50 observations on the class of 2000. This sample is restricted to jobs in consulting, finance and marketing.
The variables are: Experience (number of years), Finance (1 if the job is in Finance, 0 otherwise), Consulting (1 if the job is in Consulting, 0 otherwise), Fin*Exp (the product of Finance and Experience) and Cons*Exp ( the product of Consulting and Experience).
You may assume that your answers to the questions below are understood to be conditional upon the restricted data in the sample.
Regression Statistics
Multiple R / 0.995965
R Square / 0.991946
Adjusted R Square / 0.991031
Standard Error / 1.437934
Observations / 50
ANOVA
df / SS / MS / F / Significance F
Regression / 5 / 11204.7 / 2240.941 / 1083.808 / 7.13E-45
Residual / 44 / 90.97681 / 2.067655
Total / 49 / 11295.68
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95%
Intercept / 88.14063 / 0.741094 / 118.9331 / 7.79E-57 / 86.64705 / 89.6342
Experience / 1.928125 / 0.148219 / 13.00863 / 1.11E-16 / 1.62941 / 2.22684
Finance / 8.725047 / 1.096658 / 7.956033 / 4.67E-10 / 6.514878 / 10.93522
Consulting / 25.47798 / 1.072513 / 23.75541 / 9.96E-27 / 23.31647 / 27.63949
Fin*Exp / 3.206203 / 0.214757 / 14.92945 / 7.79E-19 / 2.773389 / 3.639018
Cons*Exp / 0.150945 / 0.190773 / 0.791227 / 0.433056 / -0.23353 / 0.535422
a. What is the predicted starting salary for a 2000 SOM graduate with a job in Consulting who has no prior working experience?
88.14 ± 25.48 = 113.62 → $113,620
b. What is the predicted starting salary for a 2000 SOM graduate with a job in Marketing who has 8 years prior working experience?
88.14 + 1.93 (8) = 103.58 → $103,580
c. What is the predicted starting salary for a 2000 SOM graduate with a job in Finance who has 4 years prior working experience?
88.14 + 8.73 + 1.93(4) + 3.21 (4)
= 117.43 → $117,430
d. Show in one graph the estimated relations between starting salary and experience for each of the functional areas. Provide numbers for the intercepts and slopes, and label all lines.
2